CN114882856A

CN114882856A - Singing sound optimization method, singing sound optimization device, singing sound optimization equipment and computer readable storage medium

Info

Publication number: CN114882856A
Application number: CN202210407333.0A
Authority: CN
Inventors: 李亚洲; 于绞龙; 沈建荣
Original assignee: Beijing Thunderstone Technology Co ltd
Current assignee: Beijing Thunderstone Technology Co ltd
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-08-09

Abstract

The application relates to the field of voice processing, and provides a singing voice optimization method, a singing voice optimization device, singing voice optimization equipment and a computer-readable storage medium. The method comprises the following steps: obtaining acoustic audio data f of target song ₁ (t), wherein the target song is a song currently sung by the singer; synchronous acquisition of microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the sound emitted by the microphone used by the singer in singing the target song; based on acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data; and playing the processed audio data. The technical scheme of this application not only makes singer's singing voice listen to, but also can keep singer's individual sound characteristics.

Description

Singing sound optimization method, singing sound optimization device, singing sound optimization equipment and computer readable storage medium

Technical Field

The present invention relates to the field of speech processing, and in particular, to a singing voice optimization method, apparatus, device, and computer-readable storage medium.

Background

With the improvement of living standard of people, going to KTV and other entertainment places to show the throat becomes a mode of relaxing more and more people. Generally speaking, going to these entertainment venues singing is several friends together. In such a situation, it is an embarrassing matter for some singers with incomplete five tones to get out of tune while singing. In response to such an embarrassing situation, one existing solution is to modify the microphone used by such singers so that the sound coming from the microphone is more pleasant and vivid. However, the tone of each person is different, and the sound modification function of the microphone in the prior art makes the sound coming out of the microphone incapable of expressing the singing characteristics of each person.

Disclosure of Invention

The application provides a singing voice optimization method, a singing voice optimization device, singing voice optimization equipment and a computer readable storage medium, which not only enable a singer to hear singing voice, but also can keep the personal voice characteristics of the singer.

In one aspect, the present application provides a singing voice optimization method, including:

obtaining acoustic audio data f of target song ₁ (t), the target song is a song currently sung by the singer;

synchronous acquisition of microphone audio data f ₂ (t), the microphone audio data f ₂ (t) corresponds to the sound emitted by a microphone used by the singer in singing the target song;

based on the acoustic audio data f ₁ (t) for the microphone audio data f ₂ (t) processing to obtain processed audio data;

and playing the processed audio data.

In another aspect, the present application provides a singing voice optimizing apparatus, comprising:

an acquisition module for acquiring the acoustic audio data f of the target song ₁ (t), the target song is a song currently sung by the singer;

a collection module for synchronously collecting microphone audio data f ₂ (t), the microphone audio data f ₂ (t) corresponds to the sound emitted by a microphone used by the singer in singing the target song;

an adjustment module for adjusting the acoustic audio data based on the acoustic audio data f ₁ (t) for the microphone audio data f ₂ (t) after treatment, obtaining a treatmentPost-audio data;

and the playing module is used for playing the processed audio data.

In a third aspect, the present application provides an apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned singing sound optimization method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described singing sound optimization method.

According to the technical scheme provided by the application, the acoustic audio data f of the target song is acquired ₁ (t) and synchronous acquisition of microphone audio data f ₂ (t) thereafter, based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data, and correcting the processed audio data based on the acoustic audio data f compared with the existing technology that only the microphone audio data of the singer is corrected ₁ (t) for microphone audio data f ₂ (t) processing so that the processed audio data is played to not only make the singer's singing sound heard but also maintain the singer's individual voice characteristics.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a singing voice optimization method provided by an embodiment of the present application;

fig. 2 is a schematic structural diagram of a singing sound optimizing apparatus provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In this specification, adjectives such as first and second may only be used to distinguish one element or action from another, without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.

In the present specification, the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

As shown in fig. 1, a flow chart of a singing voice optimizing method proposed by the present application mainly includes steps S101 to S104, which are detailed as follows:

step S101: obtaining acoustic audio data f of target song ₁ (t), wherein the target song is the song currently sung by the singer.

The target song, i.e., the song currently sung by the singer, and the acoustic audio data f of the target song ₁ (t) is the data of the vocalization corresponding to the original singing of the song currently sung by the singer, which may be the original sound intensity of the target song, such as the amplitude of the original sound. It should be noted that the "original song" herein can be understood in a broad sense, and besides that a singer or a singer who sings a target song for the first time may record a finished song in a recording studio, some turned songs having a singing level equivalent to the original singing level may also be used as the original singing herein. In principle, as long asThe original singing in the embodiment of the present application can be performed without the level of original singing in a narrow sense.

Step S102: synchronous acquisition of microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the sound emitted by the microphone used by the singer in singing the target song.

Here, the microphone audio data f ₂ (t) synchronously acquiring microphone audio data f corresponding to the voice of the microphone used by the singer when singing the target song ₂ (t) is the collection of microphone audio data f on the time axis ₂ (t) time and acquisition of acoustic audio data f of target song ₁ The time of (t) is the same.

Step S103: based on acoustic audio data f ₁ (t) for microphone audio data f ₂ And (t) processing to obtain processed audio data.

Since the final singing voice of the singer is heard and the voice characteristics of the singer are maintained to a certain extent, the voice data f based on the original voice is ₁ (t) for microphone audio data f ₂ (t) the processed audio data may be the microphone audio data f ₂ (t) and acoustic audio data f ₁ And (t) dynamically endowing corresponding weights, and combining to obtain the processed audio data. In particular, as an embodiment of the present application, for microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) dynamically assigning corresponding weights and then combining, and obtaining the processed audio data may be: finding microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) to obtain ideal audio data f (t) ═ f ₁ (t)+f ₂ (t)]2; at the initial stage, the microphone audio data f ₂ (t) and ideal audio data f (t) each being given equal weight; after the end of the initial phase, for acoustic audio data f ₁ (t) assigning a weight ω ₁ Giving a weight ω to the ideal audio data f (t), wherein ω ₁ + ω 1; performing omega ₁ *f ₁ (t) + ω f (t), converting ω to ω ₁ *f ₁ (t) + ω f (t) as processed audio data.In the above embodiment, the initial stage may be a period of time from the moment when the singer starts to sing the target song to a predetermined time. As for omega ₁ The magnitude relation with ω can be determined according to the need, for example, ω can be taken if the original singing effect is required to be maintained to a greater extent ₁ Is larger than omega, otherwise, if the voice characteristics of the singer are required to be kept to a greater extent, omega can be selected ₁ Less than ω.

As an embodiment of the present application, for microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) dynamically assigning corresponding weights and then combining, and obtaining the processed audio data may be: converting microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) comparing in real time to determine the singing level of the singer; if the singing level of the singer is above the preset level, the acoustic audio data f is processed ₁ (t) assigning a weight ω ₃ For microphone audio data f ₂ (t) assigning a weight ω ₄ Wherein, ω is ₃ Less than omega ₄ And ω is ω ₃ +ω ₄ 1 is ═ 1; if the singing level of the singer is below the preset level, the acoustic audio data f is processed ₁ (t) weight ω' ₃ For microphone audio data f ₂ (t) weight ω' ₄ Wherein, ω' ₃ Is more than omega' ₄ And ω' ₃ +ω' ₄ 1 is ═ 1; performing omega ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω' ₄ *f ₂ (t), mixing ω with ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω' ₄ *f ₂ (t) as processed audio data. In the above embodiment, the microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) the real-time comparison to determine the singing level of the singer may be specifically the microphone audio data f ₂ (t) intensity and acoustic audio data f ₁ (t) comparing the intensities in real time to determine the singing level of the singer. If the singing level of the singer is above the preset level, the singing level of the singer is relatively strong, so as to maintain the voice characteristics of the singer to a greater extentPoint, can be to the acoustic audio data f ₁ (t) giving a smaller weight ω ₃ And for microphone audio data f ₂ (t) giving greater weight ω ₄ On the contrary, if the singing level of the singer is below the preset level, the singing level of the singer is poor, and the original voice frequency data f can be processed to avoid embarrassment ₁ (t) is given a greater weight ω' ₃ And for microphone audio data f ₂ (t) is given a smaller weight ω' ₄ 。

As another embodiment of the present application, based on acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) after the processing, the obtained processed audio data may be: for microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) carrying out pronunciation matching detection to obtain a pronunciation difference result; utilizing acoustic audio data f based on voicing difference results ₁ (t) utterance correction microphone audio data f ₂ And (t) sounding to obtain processed audio data. In the above embodiments, the vocalization difference result characterizes the microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) degree of difference in utterance, and for microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) performing utterance matching detection to obtain utterance difference result, in one way, the microphone audio data f ₂ (t) as a whole to be examined, directly with the acoustic audio data f ₁ And (t) carrying out sounding matching detection to obtain a sounding difference result. Because the microphone audio data f ₂ (t) as a whole to be examined and thus compared with the acoustic audio data f ₁ (t) the matching detection efficiency is high when the sound production matching detection is performed. For microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) performing utterance matching detection, and obtaining an utterance difference result in another way may be: for microphone audio data f ₂ (t) and acoustic audio data f ₁ And (t) after cutting, performing sounding matching detection to obtain a sounding difference result. Due to the microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) after the cutting, data fragments are obtained,the granularity of subsequent utterance match detection is smaller, and therefore, the accuracy of the obtained utterance difference result is higher compared with the former mode.

As for the acoustic difference result based, the acoustic audio data f is utilized ₁ (t) utterance correction microphone audio data f ₂ (t) generating a sound to obtain processed audio data, which may specifically be, in an embodiment of the present application: from microphone audio data f ₂ (t) selecting the sounding characters to be corrected with different sounding difference result representations from at least one singer sounding character; from acoustic audio data f ₁ (t) extracting standard sounding characters corresponding to the sounding characters to be corrected from the sounding characters; replacing the standard sounding character with the sounding character to be corrected to obtain a corrected sounding character; and synthesizing the processed audio data by using the corrected sounding characters and the sounding characters except the sounding characters to be corrected in at least one singer sounding character.

Step S104: and playing the processed audio data.

Specifically, the processed audio data may be decoded and restored to a sound signal, i.e., a song.

As can be understood from the singing voice optimization method illustrated in FIG. 1, the original voice audio data f of the target song is obtained ₁ (t) and synchronous acquisition of microphone audio data f ₂ (t) thereafter, based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data, and modifying the audio data of the microphone of the singer compared with the prior art, because the method is based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing so that the processed audio data, after being played, not only makes the singer's singing voice sound, but also maintains the singer's individual voice characteristics.

Referring to fig. 2, a singing voice optimizing apparatus provided in the embodiment of the present application may include an obtaining module 201, a collecting module 202, an adjusting module 203, and a playing module 204, which are detailed as follows:

an obtaining module 201 for obtaining the target songAcoustic audio data f ₁ (t), wherein the target song is a song currently sung by the singer;

an acquisition module 202 for synchronously acquiring microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the sound emitted by the microphone used by the singer in singing the target song;

an adjustment module 203 for adjusting the audio signal based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data;

and the playing module 204 is configured to play the processed audio data.

As can be seen from the above description of the technical solutions, in obtaining the acoustic audio data f of the target song ₁ (t) and synchronous acquisition of microphone audio data f ₂ (t) thereafter, based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data, and modifying the audio data of the microphone of the singer compared with the prior art, because the method is based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing so that the processed audio data, after being played, not only makes the singer's singing voice sound, but also maintains the singer's individual voice characteristics.

Optionally, the adjusting module 203 illustrated in fig. 2 may include a merging unit for merging the microphone audio data f ₂ (t) and acoustic audio data f ₁ And (t) dynamically endowing corresponding weights, and combining to obtain the processed audio data.

Optionally, the merging unit illustrated in fig. 2 may include an average value calculating unit, a first weight giving unit, a second weight giving unit, and a first summing unit, wherein:

an average value calculation unit for calculating microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) to obtain ideal audio data f (t) ═ f ₁ (t)+f ₂ (t)]/2；

A first weighting unit for weighting the microphone audio data f at an initial stage ₂ (t) and ideal audio data f (t) each being given equal weight;

a second weight giving unit for giving the acoustic audio data f after the initial stage is finished ₁ (t) assigning a weight ω ₁ Giving a weight ω to the ideal audio data f (t), wherein ω ₁ +ω＝1；

A first summing unit for performing ω ₁ *f ₁ (t) + ω f (t), converting ω to ω ₁ *f ₁ (t) + ω f (t) as processed audio data.

Optionally, the merging unit illustrated in fig. 2 may include a determining unit, a third weight giving unit, a fourth weight giving unit, and a second summing unit, where:

a determination unit for determining the microphone audio data f ₂ (t) and acoustic audio data f ₁ (t) comparing in real time to determine the singing level of the singer;

a third weight giving unit for giving the original voice data f if the singing level of the singer is above a predetermined level ₁ (t) assigning a weight ω ₃ For microphone audio data f ₂ (t) assigning a weight ω ₄ Wherein, ω is ₃ Less than omega ₄ And ω is ₃ +ω ₄ ＝1；

A fourth weight giving unit for giving the acoustic audio data f if the singing level of the singer is below a preset level ₁ (t) weight ω' ₃ For microphone audio data f ₂ (t) weight ω' ₄ Wherein, ω' ₃ Is more than omega' ₄ And ω' ₃ +ω' ₄ ＝1；

A second summing unit for performing ω ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω' ₄ *f ₂ (t), mixing ω with ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω' ₄ *f ₂ (t) as processed audio data.

Optionally, the adjusting module 203 illustrated in fig. 2 may include a detecting unit and a correcting unit, where:

a detection unit for detecting the microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) carrying out pronunciation matching detection to obtain a pronunciation difference result;

a correction unit for utilizing the acoustic audio data f based on the sound production difference result ₁ (t) utterance correction microphone audio data f ₂ And (t) sounding to obtain processed audio data.

Optionally, the detection unit of the above example may include a first match detection unit or a second match detection unit, wherein:

a first matching detection unit for matching the microphone audio data f ₂ (t) as a whole to be examined, directly with the acoustic audio data f ₁ (t) carrying out sounding matching detection to obtain a sounding difference result;

a second matching detection unit for detecting the microphone audio data f ₂ (t) and acoustic audio data f ₁ And (t) after cutting, performing sounding matching detection to obtain a sounding difference result.

Alternatively, the correction unit of the above example may include a sorting unit, an extracting unit, a replacing unit, and a synthesizing unit, wherein:

a selection unit for selecting audio data f from the microphones ₂ (t) selecting the sounding characters to be corrected with different sounding difference result representations from at least one singer sounding character;

an extraction unit for extracting the original audio data f ₁ (t) extracting standard sounding characters corresponding to the sounding characters to be corrected from the sounding characters;

the replacing unit is used for replacing the standard sounding character with the sounding character to be corrected to obtain a corrected sounding character;

and the synthesis unit is used for synthesizing the processed audio data by utilizing the corrected sounding characters and the sounding characters except the sounding characters to be corrected in at least one singer sounding character.

Fig. 3 is a schematic structural diagram of an apparatus provided in an embodiment of the present application. As shown in fig. 3, the apparatus 3 of this embodiment mainly includes: a processor 30, a memory 31 and a computer program 32, such as a program for a singing voice optimization method, stored in the memory 31 and executable on the processor 30. The processor 30, when executing the computer program 32, implements the steps in the above-described singing voice optimization method embodiments, such as steps S101 to S104 shown in fig. 1. Alternatively, the processor 30 executes the computer program 32 to implement the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the acquiring module 201, the acquiring module 202, the adjusting module 203 and the playing module 204 shown in fig. 2.

Illustratively, the computer program 32 of the singing voice optimization method essentially comprises: obtaining acoustic audio data f of target song ₁ (t), wherein the target song is a song currently sung by the singer; synchronous acquisition of microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the sound emitted by the microphone used by the singer in singing the target song; based on acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data; and playing the processed audio data. The computer program 32 may be partitioned into one or more modules/units, which are stored in the memory 31 and executed by the processor 30 to accomplish the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 32 in the device 3. For example, the computer program 32 may be divided into functions of the acquisition module 201, the acquisition module 202, the adjustment module 203, and the playback module 204 (modules in the virtual device), and the specific functions of each module are as follows: an obtaining module 201, configured to obtain acoustic audio data f of a target song ₁ (t), wherein the target song is a song currently sung by the singer; an acquisition module 202 for synchronously acquiring microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the sound emitted by the microphone used by the singer in singing the target song; an adjustment module 203 for adjusting the audio signal based on the acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) after treatmentObtaining processed audio data; and the playing module 204 is configured to play the processed audio data.

The device 3 may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of a device 3 and does not constitute a limitation of device 3 and may include more or fewer components than shown, or some components in combination, or different components, e.g., a computing device may also include input-output devices, network access devices, buses, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the device 3, such as a hard disk or a memory of the device 3. The memory 31 may also be an external storage device of the device 3, such as a plug-in hard disk provided on the device 3, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 31 may also include both an internal storage unit of the device 3 and an external storage device. The memory 31 is used for storing computer programs and other programs and data required by the device. The memory 31 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as required to different functional units and modules, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-transitory computer readable storage medium. Based on such understanding, all or part of the processes in the method of the embodiments described above may also be implemented by a computer program instructing related hardware to complete, and the computer program of the singing sound optimization method may be stored in a computer readable storage medium, and when being executed by a processor, the computer program may implement the steps of the embodiments of the methods described above, that is, acquiring the original audio data f of the target song ₁ (t), wherein the target song is a song currently sung by the singer; synchronous acquisition of microphone audio data f ₂ (t) wherein the microphone audio data f ₂ (t) corresponds to the voice uttered by the microphone used by the singer when singing the target song; based on acoustic audio data f ₁ (t) for microphone audio data f ₂ (t) processing to obtain processed audio data; and playing the processed audio data. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The non-transitory computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier waveSignals, telecommunications signals, and software distribution media, among others. It should be noted that the non-transitory computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, non-transitory computer readable media does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice. The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application. The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present invention.

Claims

1. A method of singing voice optimization, the method comprising:

and playing the processed audio data.

2. The singing voice optimization method of claim 1, wherein the acoustic-based audio data f is based on the original voice data ₁ (t) for the microphone audio data f ₂ (t) after processing, obtaining processed audio data, comprising: for the microphone audio data f ₂ (t) and the acoustic audio data f ₁ And (t) dynamically endowing corresponding weights, and combining to obtain the processed audio data.

3. The singing voice optimization method of claim 2, wherein the pair of microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) dynamically assigning corresponding weights and then combining to obtain the processed audio data, comprising:

finding the microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) to obtain ideal audio data f (t) ═ f ₁ (t)+f ₂ (t)]/2；

At an initial stage, the microphone audio data f ₂ (t) and the ideal audio data f (t) are each given equal weight;

after the end of the initial phase, for the acoustic audio data f ₁ (t) assigning a weight ω ₁ Assigning a weight ω to the ideal audio data f (t), wherein ω ₁ +ω＝1；

Performing omega ₁ *f ₁ (t) + ω f (t), converting ω to ω ₁ *f ₁ (t) + ω f (t) as the processed audio data.

4. The singing voice optimization method of claim 2, wherein the pair of microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) dynamically assigning corresponding weights and then combining to obtain the processed audio data, comprising:

converting the microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) comparing in real time to determine the singing level of said singer;

if the singing level of the singer is above a preset level, the acoustic audio data f is processed ₁ (t) assigning a weight ω ₃ For the microphone audio data f ₂ (t) assigning a weight ω ₄ ω is said ₃ Less than said ω ₄ And ω is ₃ +ω ₄ ＝1；

If the singing level of the singer is below the preset level, the acoustic audio data f is processed ₁ (t) weight ω' ₃ For the microphone audio data f ₂ (t) weight ω' ₄ ω 'to' ₃ Is greater than omega' ₄ And ω' ₃ +ω′ ₄ ＝1；

Performing omega ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω′ ₄ *f ₂ (t) converting said ω to ω ₃ *f ₁ (t)+ω ₄ *f ₂ (t) or ω' ₃ *f ₁ (t)+ω′ ₄ *f ₂ (t) as the processed audio data.

5. The singing voice optimization method of claim 1, wherein the pair of microphone audio data f based on the acoustic audio data f1(t) ₂ (t) after processing, obtaining processed audio data, comprising:

for the microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) carrying out pronunciation matching detection to obtain a pronunciation difference result;

utilizing the acoustic audio data f based on the voicing difference result ₁ (t) the utterance modifies the microphone audio data f ₂ And (t) sounding to obtain the processed audio data.

6. The singing voice optimization method of claim 5, wherein the pair of microphone audio data f ₂ (t) and the acoustic audio data f ₁ (t) carrying out pronunciation matching detection to obtain a pronunciation difference result, comprising: converting the microphone audio data f ₂ (t) as a whole to be examined, directly with said acoustic audio data f ₁ (t) carrying out sounding matching detection to obtain the sounding difference result; or, for the microphone audio data f ₂ (t) and the acoustic audio data f ₁ And (t) after cutting, performing sounding matching detection to obtain the sounding difference result.

7. The singing voice optimization method of claim 5, wherein the acoustic audio data f is utilized based on the vocalization difference results ₁ (t) the utterance modifies the microphone audio data f ₂ (t) generating the processed audio data, comprising:

from the microphone audio data f ₂ (t) selecting the sounding characters to be corrected with different sounding difference result representations from at least one singer sounding character;

from the acoustic audio data f ₁ (t) extracting a standard sounding character corresponding to the sounding character to be corrected from the sounding characters;

replacing the standard sounding character with the sounding character to be corrected to obtain a corrected sounding character;

and synthesizing the processed audio data by using the corrected sounding characters and the sounding characters except the sounding characters to be corrected in the at least one singer sounding character.

8. An apparatus for optimizing singing voice, the apparatus comprising:

an adjustment module for adjusting the acoustic audio data based on the acoustic audio data f ₁ (t) toThe microphone audio data f ₂ (t) processing to obtain processed audio data;

and the playing module is used for playing the processed audio data.

9. An apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.