WO2020177190A1

WO2020177190A1 - Processing method, apparatus and device

Info

Publication number: WO2020177190A1
Application number: PCT/CN2019/083454
Authority: WO
Inventors: 刘承诚; 徐东; 张玫颖
Original assignee: 腾讯音乐娱乐科技（深圳）有限公司
Priority date: 2019-03-01
Filing date: 2019-04-19
Publication date: 2020-09-10
Also published as: CN109785820B; CN109785820A

Abstract

Provided are a processing method, apparatus and device. The processing method comprises: acquiring a dry sound (S501), the dry sound comprising fundamental frequency data of a song sung by a user; acquiring timbre data of the dry sound (S502), with the timbre data being acquired by means of a pre-set training model; determining at least one sound effect scheme according to the acquired timbre data of the dry sound, a singing speed for a song associated with the dry sound and the fundamental frequency data (S503), with the sound effect scheme being used for carrying out sound effect processing on the dry sound and an accompaniment of the song associated with the dry sound so as to generate an audio subjected to sound effect processing; outputting at least one sound effect scheme (S504); and generating a target audio according to an acquired target sound effect scheme (S505), wherein the target sound effect scheme is a sound effect scheme in the at least one sound effect scheme. According to the processing method, the generated audio that is subjected to sound effect processing can be more pleasant.

Description

A processing method, device and equipment

Technical field

This application relates to the field of intelligent voice technology, and in particular to a processing method, device and equipment.

Background technique

The National Institute of Standardization of the United States defines timbre as follows: "Toneness refers to a certain attribute of sound produced by the sense of hearing. The listener can judge the difference between two sounds that are presented in the same way and have the same pitch and loudness. ". Therefore, the human voice color during singing refers to the voice characteristics that people use to identify the specific singer when different singers sing the same song.

In the process of realizing the present invention, the inventor found that the prior art of song post-processing mainly includes: online fixed template timbre processing and offline artificial timbre processing. Among them, the online fixed template has the problem of "one thousand people", which can only achieve a certain fixed processing effect; offline tuner processing has the problems of low efficiency and high price.

Summary of the invention

The present application provides a processing method, device, and equipment, which can make the generated audio after sound effect processing more pleasant.

In the first aspect, this application provides a processing method, which includes:

Acquiring dry voice, where the dry voice includes fundamental frequency data of the song sung by the user;

Acquiring timbre data of the dry voice, where the timbre data is acquired through a preset training model;

According to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, at least one sound effect scheme is determined, and the sound effect scheme is used to compare the dry sound and the Perform sound effect processing on the accompaniment of songs associated with dry sound to generate audio after sound effect processing;

Output the at least one sound effect scheme;

According to the acquired target sound effect scheme, target audio is generated; the target sound effect scheme is one sound effect scheme of the at least one sound effect scheme.

In combination with the first aspect, in some possible embodiments,

The at least one target sound effect scheme includes: one sound effect scheme or multiple sound effect schemes;

After the output of the at least one sound effect scheme, before generating the target audio according to the acquired target sound effect scheme, the method further includes:

Receiving a target instruction; the target instruction is used to indicate the target sound effect scheme;

In response to the received target instruction, the target sound effect scheme is acquired.

In combination with the first aspect, in some possible embodiments,

Before obtaining the timbre data of the dry sound, the method further includes:

Preprocessing the acquired dry sound to obtain the first preprocessing data;

Perform feature extraction on the first preprocessed data to extract a first feature vector, input the first feature vector into a preset training model, and use the preset training model to extract the first feature vector The distribution and intensity of the overtones are compared with the obtained reference result of the dry sound to obtain the timbre data of the dry sound; the preset training model is a trained training model.

In combination with the first aspect, in some possible embodiments,

Performing feature extraction on the first preprocessed data to extract a first feature vector, inputting the first feature vector into a preset training model, and extracting the first feature through the preset training model The distribution and intensity of the overtones in the vector are compared with the obtained reference result of the dry sound, and before the timbre data of the dry sound is obtained, it also includes:

Perform feature extraction on multiple labeled dry sound samples to extract second feature vectors, and input the second feature vectors into the training model to be trained to obtain a preset training model; The two feature vectors are used to train the training model to be trained.

In combination with the first aspect, in some possible embodiments,

Before determining at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, the method further includes:

Determine the accompaniment identification number of the accompaniment through the accompaniment of the song associated with the dry voice;

Determining the song associated with the dry voice from a first database including a plurality of songs by using the accompaniment identification number;

According to the determined song, the singing speed of the song is determined, wherein the accompaniment identification number of the song is associated with the song.

In combination with the first aspect, in some possible embodiments,

The singing speed of the song associated with the dry voice is specifically:

The number of beats per minute of the acquired song associated with the dry sound;

or,

The obtained number of syllables per minute of the song associated with the dry sound.

In combination with the first aspect, in some possible embodiments,

The generating target audio according to the acquired target sound effect scheme includes:

Through the obtained equalization parameter values, compression parameter values, and reverberation parameter values in the target sound effect scheme, sound effect processing is jointly performed on the dry sound and the accompaniment of the song associated with the dry sound to generate target audio.

In combination with the first aspect, in some possible embodiments,

The obtained equalization parameter value, compression parameter value, and reverberation parameter value in the target sound effect scheme are combined to perform sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound to generate target audio ,include:

The degree of improvement of the sound quality of the dry sound and the accompaniment of the song associated with the dry sound is adjusted by the equalization parameter value in the target sound effect scheme, and the compression parameter value in the target sound effect scheme is used for the dry sound and the dry sound The degree of dynamic repair of the accompaniment of the associated song is adjusted, and the reverberation parameter value in the target sound effect scheme improves the sound quality of the dry sound and the accompaniment of the song associated with the dry sound, the creation of the spatial manufacturing level, and the degree of detail concealment Adjust separately to generate target audio.

In combination with the first aspect, in some possible embodiments,

The preprocessing of the obtained dry sound includes:

Perform noise reduction and/or tone repair on the acquired dry sound.

In a second aspect, this application provides a processing device, which includes:

The first acquiring unit is configured to acquire dry sound; the dry sound includes fundamental frequency data of a song sung by a user;

The second acquiring unit is used to acquire the timbre data of the dry sound;

The determining unit is configured to determine at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data; the sound effect scheme is used to Performing sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound to generate audio after sound effect processing;

An output unit, configured to output the at least one sound effect solution;

The generating unit is configured to generate target audio according to the acquired target sound effect scheme; the target sound effect scheme is one of the at least one sound effect scheme.

In combination with the second aspect, in some possible embodiments,

Also includes: preprocessing unit.

Used to: preprocess the acquired dry sound to obtain first preprocessed data.

Specifically, noise reduction and/or sound modification are performed on the acquired dry sound to obtain the first preprocessed data.

In combination with the second aspect, in some possible embodiments,

Also includes: training unit.

It is used to: perform feature extraction on multiple labeled dry sound samples to extract the second feature vector, and input the second feature vector into the training model to be trained to obtain the preset training model; second The feature vector is used to train the training model to be trained.

In combination with the second aspect, in some possible embodiments,

The determining unit is also used to: before determining at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data,

The accompaniment identification number of the accompaniment is determined by the accompaniment of the song associated with the dry sound.

The song associated with the dry voice is determined from the first database including a plurality of songs through the accompaniment identification number.

In a third aspect, this application provides a processing device, including an input device, an output device, a processor, and a memory. The processor, the input device, the output device, and the memory are connected to each other, wherein the memory is used to store the support device The application program code for executing the above processing method, and the processor is configured to execute the processing method provided in the above first aspect.

In the fourth aspect, this application provides a computer-readable storage medium for storing one or more computer programs. The one or more computer programs include instructions. When the computer programs are run on a computer, the instructions are used To execute the processing method of the first aspect described above.

In a fifth aspect, the present application provides a computer program, which includes a processing instruction, and when the computer program is executed on a computer, the utilization processing instruction is used to execute the processing method provided in the first aspect.

This application provides a processing method, device and equipment. First, get the dry sound, which includes the fundamental frequency data of the song sung by the user. Furthermore, the timbre data of the dry sound is acquired, and the timbre data is acquired through a preset training model. Then, according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, at least one sound effect plan is determined, and the sound effect plan is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound To generate audio processed by sound effects. Then, output at least one sound effect scheme. Finally, according to the acquired target sound effect scheme, target audio is generated; the target sound effect scheme is one of the at least one sound effect scheme. With this application, the sound effect processing is performed on the accompaniment of the dry sound and the song associated with the dry sound through the acquired target sound effect solution, so that the generated audio after the sound effect processing can be more pleasant.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

Figure 1 is a schematic diagram of the architecture of a processing system provided by the present application;

Fig. 2 is a schematic flow chart of obtaining dry sound provided by this application;

FIG. 3 is a schematic diagram of a sound effect solution provided by this application;

FIG. 4 is a schematic diagram of another sound effect solution provided by this application;

Fig. 5 is a schematic flowchart of a processing method provided by the present application;

Fig. 6 is a schematic block diagram of a device provided by the present application;

Fig. 7 is a schematic block diagram of a device provided by the present application.

detailed description

The technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be understood that when used in this specification and the appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.

It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

As used in this specification and the appended claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context . Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In specific implementation, the terminal described in this application includes, but is not limited to, other portable devices such as a mobile phone, a laptop computer, or a tablet computer with a touch-sensitive surface (for example, a touch screen display and/or a touch pad). It should also be understood that, in some embodiments, the device is not a portable communication device, but a desktop computer with a touch-sensitive surface (e.g., touch screen display and/or touch pad).

In the following discussion, a terminal including a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal supports various applications, such as one or more of the following: drawing application, presentation application, word processing application, website creation application, disk burning application, spreadsheet application, game application, telephone application Applications, video conferencing applications, email applications, instant messaging applications, exercise support applications, photo management applications, digital camera applications, digital camera applications, web browsing applications, digital music player applications, and / Or digital video player application.

Various application programs that can be executed on the terminal can use at least one common physical user interface device such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within corresponding applications. In this way, the common physical architecture of the terminal (for example, a touch-sensitive surface) can support various applications with a user interface that is intuitive and transparent to the user.

In order to better understand this application, the following describes the architecture of the processing system applicable to this application. Please refer to FIG. 1, which is a schematic diagram of the architecture of a processing system provided by the present application. As shown in Figure 1, the system may include, but is not limited to: a recognition part and a sound effect processing part.

Among them, the identification part may include but not limited to the following working steps:

Step 1: Obtain dry voice, and identify the fundamental frequency data of the dry voice from the obtained dry voice of the user.

Specifically, the dry voice of the user singing a song can be recorded through recording software to achieve the acquisition of the dry voice.

The dry voice of the user may be a pure human voice without accompaniment sung by the user. In other words, the dry voice may refer to the pure human voice that has not undergone post-processing (such as dynamics, compression, or reverberation) and processing after recording.

It should be noted that the fundamental frequency data is the frequency data of the fundamental tone, and the fundamental tone is the lowest sound produced by the overall vibration of the sounding body (in other words, the fundamental tone is the pure tone with the lowest frequency in each tone).

Fig. 2 exemplarily shows a schematic diagram of obtaining dry sound.

As shown in Figure 2, the recording software is recording the dry sound of a song sung by the user (such as outside of light years).

Specifically, the fundamental frequency data of the dry voice can be identified from the dry voice of the user through the Praat phonetics software. It should be noted that the fundamental frequency data of the dry voice can also be identified from the dry voice of the user through the autocorrelation algorithm, parallel processing method, cepstrum method and simplified inverse filter method.

It should be noted that the fundamental frequency data may include the upper limit of the fundamental frequency, the lower limit of the fundamental frequency, and the main tone of the fundamental frequency data.

Step 2: Preprocess the acquired dry sound to obtain first preprocessed data.

Specifically, perform noise reduction and sound repair processing on the acquired noise to obtain the first pre-processed data after noise reduction and sound repair.

Step 3: Perform feature extraction on the first preprocessed data to extract the first feature vector, input the first feature vector into a preset training model, and overtone the first feature vector through the preset training model The distribution and intensity of is compared with the obtained reference result of dry sound to obtain timbre data (timbre score) of dry sound; the preset training model is a trained training model.

It should be noted that before performing feature extraction on the first preprocessed data, and inputting the extracted first feature vector into the preset training model to obtain dry sound timbre data, the following steps are further included:

Perform feature extraction on multiple labeled dry sound samples to extract the second feature vector, and input the second feature vector into the training model to be trained to obtain the preset training model; the second feature vector is used Perform training on the training model to be trained.

Perform feature extraction on the first preprocessed data, and input the extracted first feature vector into a preset training model to obtain dry sound timbre data, which may specifically include the following working processes:

Work process 11: Perform feature extraction on the first preprocessed data to extract a first feature vector, and input the extracted first feature vector into a preset training model.

Working process 12: The distribution and intensity of the overtones in the extracted first feature vector are compared with the reference result of the dry sound through a preset training model to obtain the timbre data of the dry sound. It should be noted that the reference result of the dry sound may be the distribution and intensity of the overtones in the feature vector corresponding to the dry sound of the star.

It should be noted that, taking the fundamental tone as the standard, each part (one-half or one-third) of the vocal body is also vibrating, which can be the overtone in the embodiment of this application, where the combination of overtones can determine a specific tone. And can make people clearly feel the loudness of the fundamental tone.

Step 4: Get the singing speed of the song associated with the dry voice.

Specifically, the obtained singing speed of the song associated with the dry voice may specifically be:

The number of beats per minute (BPM) of the song associated with the dry sound obtained;

or,

The number of syllables per minute (SPM) of the song associated with the obtained dry sound.

It should be noted that obtaining the singing speed of songs associated with dry voice may specifically include the following working processes:

Work process 21: Determine the accompaniment identification number (ID) of the accompaniment through the accompaniment of the song associated with the dry sound.

Work process 22: Determine the dry-voice-related songs from the second database including a plurality of songs through the accompaniment identification number.

Wherein, the second database may be a music library including multiple songs.

Work process 23: Determine the singing speed of the song according to the determined song, where the accompaniment identification number of the song is associated with the song.

It should be noted that one song can be associated with one or more accompaniments.

If a song is associated with multiple accompaniments, each accompaniment can have a unique accompaniment identification number.

For example, for the song "I Love You For Ten Thousand Years", the accompaniment associated with the song "I Love You For Ten Thousand Years" may include: male accompaniment, female accompaniment, and DJ accompaniment, among which the accompaniment identification number of the male accompaniment may be It is 11. The accompaniment identification number of female accompaniment can be 22 and the accompaniment identification number of DJ accompaniment can be 33.

The sound effect processing part can include but is not limited to the following working processes:

Determine at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data; the sound effect scheme is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound to generate Audio after sound effect processing.

Specifically, generating the target audio according to the acquired target sound effect scheme includes:

Through the acquired equalization parameter value, compression parameter value and reverberation parameter value in the target sound effect scheme, the sound effect processing is jointly performed on the accompaniment of the song associated with the dry sound and the dry sound to generate the target audio.

More specifically, the degree of improvement of the sound quality of the accompaniment of dry sound and dry sound related songs can be adjusted through the equalization parameter value in the target sound effect plan, and the compression parameter value in the target sound effect plan can be used for dry sound and dry sound related songs. Adjust the dynamic repair level of the accompaniment and the reverberation parameter value in the target sound effect plan to improve the sound quality of the accompaniment of the dry and dry-related songs, the creation of the spatial manufacturing level, and the degree of detail concealment.

Output at least one sound effect scheme.

Specifically, output at least one sound effect scheme, which may specifically include but is not limited to the following forms:

At least one sound effect scheme will be displayed, or the at least one sound effect scheme will be played.

It should be noted that at least one sound effect scheme can include but is not limited to the following two situations:

Case 1: At least one sound effect scheme, including: a sound effect scheme; the sound effect scheme is used to perform sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound to generate audio after sound effect processing.

Case 2: At least one sound effect scheme, including: multiple sound effect schemes; each of the multiple sound effect schemes is used to perform sound effect processing on the accompaniment of dry sound and dry sound-related songs to generate multiple sound effect processing Audio.

Fig. 3 exemplarily shows a sound effect scheme.

As shown in FIG. 3, the sound effect scheme may include four sound effect schemes that can be used to perform sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound, respectively.

It should be noted that the name of the sound effect scheme may be AI reverberation, and the sound effect scheme can be used to formulate the user's exclusive audio based on the user's dry voice, and the user can audition the singing voice developed through the sound effect scheme. Specifically, the four sound effect schemes are as follows:

Sound effect scheme 1: The timbre data of the user's dry voice, the singing speed of the song associated with the dry voice, the timbre data of the fundamental frequency data and the ideal dry voice, the singing speed of the song associated with the ideal dry voice and the fundamental frequency data A sound effect scheme with a matching degree of 90%.

Sound effect plan 2: User’s dry voice timbre data, dry voice-related song singing speed, fundamental frequency data and ideal dry voice timbre data, ideal dry voice-related song singing speed and fundamental frequency data A sound effect scheme with a matching degree of 80%.

Sound effect scheme 3: The timbre data of the user’s dry voice, the singing speed of the song associated with the dry voice, the timbre data of the fundamental frequency data and the ideal dry voice, the singing speed of the song associated with the ideal dry voice and the fundamental frequency data A sound effect scheme with a matching degree of 60%.

Sound effect scheme 4: User’s dry voice timbre data, dry voice-related song singing speed, fundamental frequency data and ideal dry voice timbre data, ideal dry voice-related song singing speed and fundamental frequency data The sound effect scheme with a matching degree of 90% is the recommended sound effect scheme (the user is recommended to choose first) that can be used to process the sound effect of the dry sound and the accompaniment of the song associated with the dry sound. The other three solutions are (recommended The user second priority) is a sound effect solution that can be selected by the user to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound.

It should be noted that the name of the sound effect scheme may also be smart sound effect. The sound effect solution can be used to formulate the user's exclusive audio based on the user's dry voice. If the user does not wear headphones to record the user's dry voice, it may affect the audio effect developed through the sound effect solution; The system prompts or recommends that the user wear headphones to record while singing.

It should be noted that the user can audition the audio developed through the sound effect scheme. If the user is a VIP (Very Important People), the user can publish the audio developed by the sound effect solution based on the user's dry voice.

Figure 4 exemplarily shows another sound effect scheme.

As shown in FIG. 4, the sound effect scheme may include multiple sound effect schemes that can be used to perform sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound.

Specifically, the sound effect solution may include, but is not limited to: the timbre data of the user’s dry voice, the singing speed of the song associated with the dry voice, the timbre data of the fundamental frequency data and the ideal dry voice, the singing speed of the song associated with the dry voice, The matching degree between the fundamental frequency data is the sound effect scheme of 90%, the sound effect scheme of KTV sound effect, the sound effect scheme of magnetic sound effect, the sound effect scheme of song sound effect, the sound effect scheme of far away artistic conception sound effect and so on.

It should be noted that the timbre data of the user’s dry voice, the singing speed of the song associated with the dry voice, the fundamental frequency data and the timbre data of the ideal dry voice, the singing speed of the song associated with the ideal dry voice, and the fundamental frequency data The sound effect scheme with a matching degree of 90% may be a preferred sound effect scheme recommended to the user that can be used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound. The other solution may be a second-preferred sound effect solution recommended to the user that can be used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound.

In summary, the embodiment of the present application provides a processing system. The processing system includes a recognition part and a sound effect processing part. The processing system obtains the dry voice through the recognition part, and the dry voice includes the fundamental frequency data of the song sung by the user. Furthermore, the processing system obtains the timbre data of the dry sound through the recognition part, and the timbre data is obtained through the preset training model. Then, the processing system determines at least one sound effect scheme through the sound effect processing part according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, and the sound effect scheme is used to associate the dry sound with the dry sound. The accompaniment of the song is processed with sound effects to generate audio after sound effect processing. Then, the processing system outputs at least one sound effect scheme through the sound effect processing part. Finally, the processing system generates the target audio according to the acquired target sound effect scheme through the sound effect processing part; the target sound effect scheme is one of the at least one sound effect scheme. According to the embodiment of the present application, by determining the target sound effect scheme from the first database including multiple sound effect schemes to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound, the generated audio after the sound effect processing can be more pleasant to hear .

It is understandable that FIGS. 2 to 4 are only used to explain the embodiments of the present application, and should not limit the present application.

Refer to FIG. 5, which is a schematic flowchart of a processing method provided by the present application. As shown in Figure 5, the method can at least include the following steps:

S501: Obtain dry sound.

In the embodiment of this application, the dry voice includes the fundamental frequency data of the song sung by the user.

It should be noted that the fundamental frequency data in the dry voice can be identified from the dry voice of the user through Praat phonetics software, and it can also be learned from the dry voice of the user through the autocorrelation algorithm, parallel processing method, cepstrum method and simplified inverse filter method. Recognized in the sound.

It should be noted that a song can be an art form that combines lyrics and scores.

The dry voice can be a pure human voice without accompaniment sung by the user. In other words, the dry voice can refer to the pure human voice without post-processing (such as dynamics, compression or reverberation, etc.) or processing after recording.

S502. Acquire timbre data of dry sound.

In the embodiment of the present application, before obtaining the timbre data of the dry sound, the following working steps are further included:

Work step 1: preprocess the acquired dry sound to obtain first preprocessed data.

Specifically, preprocessing the acquired dry sound can specifically include the following working processes:

Perform noise reduction and/or sound modification on the acquired dry sound to obtain first preprocessed data after noise reduction and sound modification.

Work step 2: Perform feature extraction on the first preprocessed data to extract the first feature vector, input the first feature vector into a preset training model, and use the preset training model to sum the distribution of overtones in the first feature vector The intensity is compared with the obtained reference result of the dry sound to obtain the timbre data of the dry sound; wherein, the preset training model is a trained training model.

Among them, the reference result of the dry sound may be the difference and intensity of the overtones in the feature vector corresponding to the dry sound of the star. If the distribution and intensity of the overtones in the first feature are closer to the reference result of dry sound, the score of the dry sound of the user is higher.

It should be noted that before performing feature extraction on the first pre-processed data, and inputting the extracted first feature vector into the preset training model to obtain dry sound timbre data, the following steps are further included:

S503: Determine at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data.

In the embodiment of the present application, the sound effect solution is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound to generate audio after sound effect processing.

It should be noted that after at least one sound effect scheme is output, the following steps are further included before generating the target audio according to the obtained target sound effect scheme:

Step 1: Receive a target instruction; the target instruction is used to indicate a target sound effect scheme (that is, indicate to obtain a target sound effect scheme associated with the target instruction).

Step 2: In response to the received target instruction, obtain the target sound effect scheme.

It should be noted that before determining at least one sound effect scheme based on the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, the following working process is also included:

Work process 1: Determine the accompaniment identification number of the accompaniment through the accompaniment of the song associated with the dry sound.

Work process 2: Determine the dry-voice-related songs from the first database including multiple songs through the accompaniment identification number.

Work process 3: Determine the singing speed of the song according to the determined song, where the accompaniment identification number of the song is associated with the song.

It should be noted that the singing speed of songs related to dry voice can be specifically:

The number of beats per minute of the song associated with the obtained dry sound.

or,

The number of syllables per minute of the song associated with the obtained dry sound.

Case 1: At least one sound effect scheme, which may include: one sound effect scheme; this scheme is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound to generate audio after sound effect processing.

Case 2: At least one sound effect scheme, which may include: multiple sound effect schemes; each of the multiple sound effect schemes is used to perform sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound to generate multiple sound effect processing respectively After the audio.

S504. Output at least one sound effect solution.

Specifically, outputting the target sound effect scheme may specifically include but not limited to the following forms:

At least one sound effect scheme will be displayed for display, or at least one sound effect scheme will be played.

S505: Generate target audio according to the acquired target sound effect scheme.

In the embodiment of the present application, generating the target audio according to the acquired target sound effect scheme may specifically include the following process:

More specifically, the degree of improvement of the sound quality of the accompaniment of dry sound and dry sound related songs is adjusted through the equalization parameter value in the target sound effect plan, and the compression parameter value in the target sound effect plan is used for the dry sound and dry sound related songs. The degree of dynamic repair of the accompaniment is adjusted and the reverberation parameter value in the target sound effect scheme is adjusted to improve the sound quality of the accompaniment of dry and dry-related songs, the creation of spatial manufacturing levels, and the degree of detail concealment.

In summary, the embodiment of the present application provides a processing method. First, get the dry sound, which includes the fundamental frequency data of the song sung by the user. Furthermore, the timbre data of the dry sound is acquired, and the timbre data is acquired through a preset training model. Then, according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, at least one sound effect plan is determined, and the sound effect plan is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound To generate audio processed by sound effects. Then, output at least one sound effect scheme. Finally, according to the acquired target sound effect scheme, target audio is generated; the target sound effect scheme is one of the at least one sound effect scheme. According to the embodiments of the present application, sound effect processing can be performed on the accompaniment of dry sound and dry sound-related songs through the acquired target sound effect scheme, which can make the generated audio after sound effect processing more beautiful and more pleasant to hear.

It is understandable that related definitions and descriptions not provided in the method embodiment in FIG. 5 can be referred to the embodiment in FIG. 1, and will not be repeated here.

Refer to Figure 6, which is a processing device provided by this application. As shown in FIG. 6, the processing device 60 includes: a first acquiring unit 601, a second acquiring unit 602, a determining unit 603, an output unit 604, and a generating unit 605. among them:

The first acquiring unit 601 is used to acquire dry sound; the dry sound includes fundamental frequency data of the song sung by the user.

The second acquiring unit 602 is used to acquire the timbre data of the dry sound.

The determining unit 603 is configured to determine the target sound effect from the first database including multiple sound effect schemes according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data identified from the dry sound Scheme; the target sound effect scheme is used to perform sound effect processing on the accompaniment of the dry sound and the song associated with the dry sound to generate audio after sound effect processing.

The output unit 604 is configured to output at least one sound effect scheme.

The generating unit 605 is configured to generate target audio according to the acquired target sound effect scheme; the target sound effect scheme is one sound effect scheme among at least one sound effect scheme.

The generating unit 605 can be specifically used for adjusting the improvement degree of the sound quality of the dry sound and the accompaniment of the song associated with the dry sound through the equalization parameter value in the target sound effect scheme, and the compression parameter value in the target sound effect scheme for the dry sound and dry sound Adjust the dynamic repair degree of the accompaniment of the associated song and adjust the reverberation parameter value in the target sound effect plan to improve the sound quality of the accompaniment of the dry and dry-associated songs, the creation of the spatial manufacturing level, and the degree of detail concealment respectively. Generate target audio.

The processing device 60 includes: a first obtaining unit 601, a second obtaining unit 602, a determining unit 603, an output unit 604, and a generating unit 605, and also includes a preprocessing unit.

Used to: preprocess the acquired dry sound to obtain first preprocessed data.

The processing device 60 includes: a first acquiring unit 601, a second acquiring unit 602, a determining unit 603, an output unit 604, and a generating unit 605, as well as a training unit.

It should be noted that the determining unit is also used to: before determining at least one sound effect scheme based on the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data,

To sum up, in the embodiment of the present application, the device 60 can obtain dry sound through the first acquiring unit 601; dry sound includes the fundamental frequency data of the song sung by the user; further, the device 60 obtains dry sound through the second acquiring unit 602 The timbre data; then, the device 60 determines at least one sound effect scheme through the determining unit 604 according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data; the sound effect scheme is used for the dry sound and the dry sound The accompaniment of the associated song performs sound effect processing to generate sound effect-processed audio; then, the device 60 outputs the target sound effect scheme through the output unit 604; finally, the device 60 generates the target sound effect scheme according to the acquired target sound effect scheme through the generating unit 605 Audio; the target sound effect scheme is one sound effect scheme in at least one sound effect scheme. According to the embodiment of the present application, the sound effect processing can be performed on the accompaniment of the dry sound and the song associated with the dry sound through the acquired target sound effect scheme, so that the generated audio after the sound effect processing can be more beautiful.

It should be understood that the device 60 is only an example provided by the embodiment of the present application, and the device 60 may have more or less components than the shown components, may combine two or more components, or may have Different configurations are implemented.

It is understandable that for the specific implementation of the functional blocks included in the device 60 in FIG. 6, reference may be made to the embodiments described in FIG. 1 and FIG. 5, which will not be repeated here.

Fig. 7 is a schematic structural diagram of a processing device provided by the present application. In the embodiments of this application, the devices may include mobile phones, tablet computers, personal digital assistants (Personal Digital Assistant, PDA), mobile Internet devices (Mobile Internet Device, MID), smart wearable devices (such as smart watches, smart bracelets), etc. Various devices are not limited in the embodiments of this application. As shown in FIG. 7, the device 70 may include: a baseband chip 701, a memory 702 (one or more computer-readable storage media), and a peripheral system 703. These components can communicate on one or more communication buses 704.

The baseband chip 701 may include: one or more processors (CPU) 705.

The processor 705 can be specifically used for:

Get dry sound; dry sound includes the fundamental frequency data of the song sung by the user.

Obtain dry sound data.

It is used to generate target audio according to the acquired target sound effect scheme; the target sound effect scheme is one sound effect scheme among at least one sound effect scheme.

The memory 702 is coupled with the processor 705 and may be used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 702 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 702 can store an operating system (hereinafter referred to as system), such as an embedded operating system such as ANDROID, IOS, WINDOWS, or LINUX. The memory 702 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more device devices, and one or more network devices. The memory 702 can also store a user interface program, which can vividly display the content of the application program through a graphical operation interface, and receive user control operations on the application program through input controls such as menus, dialog boxes, and keys. .

It is understandable that the memory 702 may be used to store implementation code for implementing the processing method.

The memory 702 may also store one or more application programs. These applications may include: K song programs, social applications (such as Facebook), image management applications (such as photo albums), map applications (such as Google Maps), browsers (such as Safari, Google Chrome), and so on.

The peripheral system 703 is mainly used to implement the interaction function between the user of the device 70 and the external environment, and mainly includes the input and output devices of the device 70. In specific implementation, the peripheral system 703 may include: a display controller 707, a camera controller 708, and an audio controller 709. Among them, each controller can be coupled with its corresponding peripheral device (such as the display screen 710, the camera 711, and the audio circuit 712). In some embodiments, the display screen may be a display screen 1 equipped with a self-capacitive floating touch panel, or a display screen configured with an infrared floating touch panel. In some embodiments, the camera 711 may be a 3D camera. It should be noted that the peripheral system 703 may also include other I/O peripherals.

In summary, in this embodiment of the present application, the device 70 can obtain dry sound through the processor 705; the dry sound includes the fundamental frequency data of the user sings the song; further, the device 70 can obtain the timbre data of the dry sound through the processor 705; , The device 70 can determine at least one sound effect scheme through the processor 705 according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data; the sound effect scheme is used for the dry sound and the song associated with the dry sound Perform sound effect processing on the accompaniment to generate the processed audio; then, the device 70 can output the target sound effect scheme through the peripheral system 703; finally, the device 70 can generate the target audio through the processor 705 according to the acquired target sound effect scheme ; The target sound effect scheme is one sound effect scheme in at least one sound effect scheme. According to the embodiment of the present application, the sound effect processing can be performed on the accompaniment of the dry sound and the song associated with the dry sound through the acquired target sound effect scheme, so that the generated audio after the sound effect processing can be more beautiful.

It should be understood that the device 70 is only an example provided in the embodiment of the present application, and the device 70 may have more or less components than the components shown, may combine two or more components, or may have Different configurations are implemented.

It is understandable that for the specific implementation of the functional modules included in the device 70 in FIG. 7, reference may be made to the embodiments in FIG. 1 and FIG. 5, and details are not described herein again.

The present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is implemented when executed by a processor.

The computer-readable storage medium may be an internal storage unit of the device described in any of the foregoing embodiments, such as a hard disk or memory of the device. The computer-readable storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on. Further, the computer-readable storage medium may also include both an internal storage unit of the device and an external storage device. The computer-readable storage medium is used to store computer programs and other programs and data required by the device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

The present application also provides a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. The computer program is operable to cause a computer to execute any of the methods described in the above method embodiments. Part or all of the steps. The computer program product may be a software installation package, and the computer includes an electronic device.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described in terms of function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working processes of the devices and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, to describe the composition and steps of each example. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.

The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation. For example, multiple units or components can be combined or integrated into Another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A processing method, characterized in that it comprises:

Acquiring dry voice, where the dry voice includes fundamental frequency data of the song sung by the user;

Acquiring timbre data of the dry voice, where the timbre data is acquired through a preset training model;

According to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data, at least one sound effect scheme is determined, and the sound effect scheme is used to compare the dry sound and the sound Perform sound effect processing on the accompaniment of songs related to the dry sound;

Output the at least one sound effect scheme;

According to the acquired target sound effect scheme, a target audio is generated, where the target sound effect scheme is one sound effect scheme of the at least one sound effect scheme.
The method of claim 1, wherein:

The at least one sound effect scheme includes: one sound effect scheme or multiple sound effect schemes;

After the output of the at least one sound effect scheme, before generating the target audio according to the acquired target sound effect scheme, the method further includes:

Receiving a target instruction, where the target instruction is used to indicate the target sound effect scheme;

In response to the received target instruction, the target sound effect scheme is acquired.
The method according to claim 1, wherein before said obtaining the timbre data of the dry sound, the method further comprises:

Preprocessing the acquired dry sound to obtain the first preprocessing data;

Perform feature extraction on the first preprocessed data to extract a first feature vector, input the first feature vector into a preset training model, and use the preset training model to extract the first feature vector The distribution and intensity of the overtones are compared with the obtained reference result of the dry sound to obtain the timbre data of the dry sound; the preset training model is a trained training model.
The method of claim 3, wherein the feature extraction is performed on the first pre-processed data to extract a first feature vector, and the first feature vector is input into a preset training model, Before comparing the distribution and intensity of overtones in the first feature vector with the obtained reference result of dry sound through the preset training model to obtain the timbre data of the dry sound, the method further includes:

Perform feature extraction on multiple labeled dry sound samples to extract second feature vectors, and input the second feature vectors into the training model to be trained to obtain a preset training model; The two feature vectors are used to train the training model to be trained.
The method according to claim 1, characterized in that before determining at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data ,Also includes:

Determine the accompaniment identification number of the accompaniment through the accompaniment of the song associated with the dry voice;

Determining the song associated with the dry voice from a first database including a plurality of songs by using the accompaniment identification number;

According to the determined song, the singing speed of the song is determined, wherein the accompaniment identification number of the song is associated with the song.
The method according to claim 1, wherein the singing speed of the song associated with dry voice is specifically:

The number of beats per minute of the acquired song associated with the dry sound;

or,

The obtained number of syllables per minute of the song associated with the dry sound.
The method of claim 1, wherein the generating target audio according to the acquired target sound effect scheme comprises:

Through the obtained equalization parameter values, compression parameter values, and reverberation parameter values in the target sound effect scheme, sound effect processing is jointly performed on the dry sound and the accompaniment of the song associated with the dry sound to generate target audio.
The method according to claim 7, wherein the obtained equalization parameter value, compression parameter value, and reverberation parameter value in the target sound effect scheme are used to jointly perform a joint analysis on the dry sound and the dry sound. The accompaniment of the song associated with the sound is processed with sound effects to generate the target audio, including:

The degree of improvement of the sound quality of the dry sound and the accompaniment of the song associated with the dry sound is adjusted by the equalization parameter value in the target sound effect scheme, and the compression parameter value in the target sound effect scheme is used for the dry sound and the dry sound The degree of dynamic repair of the accompaniment of the associated song is adjusted, and the reverberation parameter value in the target sound effect scheme improves the sound quality of the dry sound and the accompaniment of the song associated with the dry sound, the creation of the spatial manufacturing level, and the degree of detail concealment Adjust separately to generate target audio.
The method according to claim 3, wherein said preprocessing the acquired dry sound comprises:

Perform noise reduction and/or tone repair on the acquired dry sound.
A processing device, characterized in that it comprises:

The first acquiring unit is configured to acquire dry sound; the dry sound includes fundamental frequency data of a song sung by a user;

The second acquiring unit is used to acquire the timbre data of the dry sound;

The determining unit is configured to determine at least one sound effect scheme according to the acquired timbre data of the dry sound, the singing speed of the song associated with the dry sound, and the fundamental frequency data; the sound effect scheme is used to Performing sound effect processing on the dry sound and the accompaniment of the song associated with the dry sound to generate audio after sound effect processing;

An output unit, configured to output the at least one sound effect solution;

The generating unit is configured to generate target audio according to the acquired target sound effect scheme; the target sound effect scheme is one of the at least one sound effect scheme.
A processing device, characterized by comprising: an input device, an output device, a memory, and a processor coupled to the memory, the input device, output device, processor and memory are connected to each other, wherein the memory is used for An application program code is stored, and the processor is configured to call the program code to execute the processing method according to claims 1-9.
A computer-readable storage medium, wherein the computer storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute as claimed in claim 1. -9 The processing method described in any one.