CN114223032A - Memory, microphone, audio data processing method, device, equipment and system - Google Patents

Memory, microphone, audio data processing method, device, equipment and system Download PDF

Info

Publication number
CN114223032A
CN114223032A CN201980096054.3A CN201980096054A CN114223032A CN 114223032 A CN114223032 A CN 114223032A CN 201980096054 A CN201980096054 A CN 201980096054A CN 114223032 A CN114223032 A CN 114223032A
Authority
CN
China
Prior art keywords
sound
user
voice
characteristic
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980096054.3A
Other languages
Chinese (zh)
Inventor
徐俊丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Zhongjia Shengshi Intelligent Technology Co ltd
Original Assignee
Chongqing Zhongjia Shengshi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Zhongjia Shengshi Intelligent Technology Co ltd filed Critical Chongqing Zhongjia Shengshi Intelligent Technology Co ltd
Publication of CN114223032A publication Critical patent/CN114223032A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention relates to a memory, a microphone, an audio data processing method, a device, equipment and a system, wherein the method comprises the steps of measuring the sound production characteristic of a user and acquiring the corresponding sound characteristic of the user when the user imitates the preset sound; generating a voice optimization rule of the user according to the corresponding relation between the voice characteristic of the user and the target voice; acquiring the current sound characteristic of a user according to the audio data of the real-time sound production of the user; determining a current sound correction value according to a sound optimization rule by taking the current sound characteristic as a parameter; and carrying out real-time sound correction on the audio data according to the current sound correction value. The application can enable the singing effect of the user to be reflected in the sound effect, and the sound effect comprises the sound characteristics such as frequency, tone, accent and the like which cannot be sung originally by the user, so that the requirement of personalized sound optimization of the user is met. In addition, the application scenario of the voice optimization in the present application may further include a voice optimization requirement for other voice expressions such as a speech of a user.

Description

Memory, microphone, audio data processing method, device, equipment and system Technical Field
The present invention relates to the field of acoustics, and in particular, to a memory, a microphone, an audio data processing method, apparatus, device, and system.
Background
With the popularization of karaoke input devices such as microphones (microphones), the standard of demand for user experience when using an acoustic input unit is becoming higher.
In the prior art, in order to improve the live hearing effect of the acoustic input unit, continuous improvement and optimization are performed on the aspect of "beautifying" the sound input by the microphone, such as: most commonly, the tone of a singer is made to sound fuller by using a reverberation effect (reverb/echo) provided; the function and effect of the reverberation effect can be referred to as that commonly used in karaoke.
The inventor researches and discovers that the method for improving the sound effect of the acoustic input unit to the user by simply providing the reverberation effect (reverberation effect device) and the like has at least the following defects:
the method of processing the user's voice by using the reverberation effect alone cannot make up the pronunciation defect of each user in a targeted manner, and thus cannot meet the requirements of different users on the ' beautifying ' of the own pronunciation effect.
Disclosure of Invention
The invention aims to provide a memory, a microphone, an audio data processing method, a device, equipment and a system, aiming at the problem that the audio data processing in the prior art cannot make up the pronunciation defect of a user in a targeted manner.
In order to achieve the above object, according to one aspect of the present invention, there is provided an audio data processing method including:
in a preset stage:
measuring the sound production characteristics of a user, and acquiring the sound characteristics of the user corresponding to actual sound when the user imitates preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone;
generating a voice optimization rule of the user, wherein the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic;
in the application stage:
acquiring the current sound characteristic of the user according to the audio data of the real-time sound production of the user;
and correcting the audio data in real time according to the sound optimization rule by taking the current sound characteristic as a parameter.
Preferably, in an embodiment of the present invention, the sound optimization rule is further configured to:
and adjusting the pitch intensity of the voice sent by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
Preferably, in an embodiment of the present invention, the sound optimization rule is further configured to:
and according to the corresponding relation between the voice characteristic of the user and the target voice characteristic, adjusting the overtone composition of the voice sent by the user and/or the proportion of each overtone to the fundamental tone to be consistent with the overtone composition in the corresponding target voice characteristic and/or the proportion of each overtone to the fundamental tone.
Preferably, in an embodiment of the present invention, the method for generating the sound optimization rule includes:
generating an independent variable according to the corresponding sound characteristic when a user imitates the preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound;
generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform is carried out on the preset sound;
and training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting the actual sound of the user into the target sound which the user wants to send through deep learning.
Preferably, in an embodiment of the present invention, the artificial intelligence includes a neural network.
In another aspect of the embodiments of the present invention, there is also provided a memory including an instruction set, the instruction set being suitable for a processor to execute the steps of the audio data processing method described above.
In another aspect of the embodiments of the present invention, there is also provided an audio data processing device, including a bus, a processor, and the above memory;
the bus is used for connecting the memory and the processor;
the processor is configured to execute a set of instructions in the memory.
In another aspect of the embodiments of the present invention, there is also provided an audio data processing apparatus, including:
the characteristic acquisition module is used for measuring the sound production characteristic of the user and acquiring the sound characteristic of the user corresponding to the actual sound when the user imitates the preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone;
the correction module is used for correcting the audio data in real time according to a sound optimization rule by taking the current sound characteristic as a parameter; and the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
Preferably, in the embodiment of the present invention, the audio system further includes an audio output module connected to the preset sound output device, and/or an audio input module connected to the sound pickup device;
the audio output module is used for transmitting the corrected audio data to the preset sound processing equipment;
the audio transfer-in module is used for transmitting the audio data of the user obtained by the sound pickup equipment to the characteristic obtaining module.
Preferably, in the embodiment of the present invention, the preset sound processing device includes a sound box and/or a power amplifier; the sound pickup apparatus includes a microphone.
Preferably, in an embodiment of the present invention, the sound optimization rule is further configured to:
and adjusting the pitch intensity of the voice sent by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
Preferably, in an embodiment of the present invention, the sound optimization rule is further configured to:
and according to the corresponding relation between the voice characteristic of the user and the target voice characteristic, adjusting the overtone composition of the voice sent by the user and/or the proportion of each overtone to the fundamental tone to be consistent with the overtone composition in the corresponding target voice characteristic and/or the proportion of each overtone to the fundamental tone.
Preferably, in an embodiment of the present invention, a method for generating the sound optimization rule includes:
generating an independent variable according to the corresponding sound characteristic when a user imitates the preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound;
generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform is carried out on the preset sound;
and training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting the actual sound of the user into the target sound which the user wants to send through deep learning.
Preferably, in an embodiment of the present invention, the artificial intelligence includes a neural network. .
In another aspect of the embodiments of the present invention, there is also provided a microphone, including an audio adapter plug, a sound pickup unit, and the audio data processing apparatus described above;
the characteristic acquisition module is connected with the pickup unit circuit and used for acquiring audio data of a user;
the audio adapter plug is adapted to a preset sound processing device and is used for transmitting the corrected audio data to the preset sound processing device.
In another aspect of the embodiments of the present invention, there is also provided an audio data processing system, including a microphone, an optimization rule generating device, and the audio data processing device; and the optimization rule generating device is used for adjusting one or any combination of the sound characteristics of the sound emitted by the user to be consistent with the corresponding target sound characteristics according to the corresponding relation between the sound characteristics of the user and the target sound characteristics.
Advantageous effects
In summary, in the embodiment of the present invention, the sound generation characteristic of the user is first determined, so that the sound characteristic of the user imitating the actual sound of the preset sound from time to time can be obtained according to the difference of the sound generation capabilities of the user at different sound frequencies; then, according to the corresponding relation between the target sound characteristic of the preset sound and the sound characteristic of the actual sound production of the user, a personalized sound optimization rule can be generated for the user; after the presetting, when a user needs to perform sound optimization, the current sound characteristic of the user is obtained through the real-time sounding audio data, then the current sound characteristic is taken as a parameter, and the sound characteristic expected by the user is judged through the sound optimization rule of the user; and then the sound of the user can be corrected in a personalized and real-time manner.
By the embodiment of the invention, when singing a song, a user can embody the singing effect of the user in the sound effect through sound optimization, and the sound effect can include the frequency or other sound characteristics which cannot be sung originally by the user, so that the requirement of personalized sound optimization of the user is met. It should be noted that, in the embodiment of the present invention, an application scenario of sound optimization is not limited to singing by a user, and may also include a sound optimization requirement for other sound expressions such as a speech of the user.
Furthermore, the sound optimization rule in the embodiment of the present invention may also be used to adjust the sound intensity of the actual sound of the user; therefore, the sound optimization and adjustment can be provided for the user in the labor-consuming pronunciation area of the user, so that the user can more easily send out the ideal sound effect.
Furthermore, the sound optimization rule in the embodiment of the present invention may also be used to adjust the composition of the overtones of the actual sound of the user, and/or the ratio of each overtone to the fundamental tone; the composition of the overtones and the proportion of each overtone to the fundamental tone can determine the tone of the pronunciation of the user; therefore, the embodiment of the invention can make the tone of the user more pleasant by adjusting the composition of the overtones of the actual voice of the user and/or the proportion of each overtone to the fundamental tone, thereby further improving the user experience of voice optimization.
Further, in the embodiment of the present invention, a generation manner of the sound optimization rule may specifically be: training by using a common artificial intelligence model such as a CNN multilayer neural network according to the corresponding relation between the actual sound (the spectrogram after Fourier transformation is used as input data of the neural network) emitted by the user and the target sound (the spectrogram after Fourier transformation is used as target output data of the neural network), and generating a sound optimization model through deep learning, wherein the sound optimization model is used for converting the actual sound of the user into the target sound expected to be emitted by the user. By constructing the artificial intelligent sound optimization model, the comprehensive corresponding relation between the actual pronunciation of the user and the target sound expected to be made by the user can be determined more accurately and widely, and more accurate data basis is provided for subsequent sound correction (including correction of various sound characteristics such as fundamental tone and/or overtone and/or speech speed and/or accent).
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating an audio data processing method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a hardware configuration of an audio data processing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an audio data processing apparatus according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The acoustic input unit in the embodiment of the invention can be karaoke input equipment such as a microphone (microphone) and the like, and can also be other human voice acquisition units used for audio equipment.
The inventor researches and discovers that although the existing reverberation effect technology can enable the sound of a user to be fuller, when the pronunciation capability of the user for certain frequency regions is defective, the reverberation effect technology cannot be corrected correspondingly.
In order to purposefully solve the problem of pronunciation deficiency of the user, an embodiment of the present invention provides an audio data processing method, as shown in fig. 1, including:
in a preset stage:
s11, measuring the sound production characteristics of the user, and acquiring the sound characteristics corresponding to the actual sound when the user imitates the preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in the voice of the user, a plurality of target voice characteristics and the proportion of each overtone to the fundamental tone;
the method for determining the sound production characteristics comprises the steps of providing a preset sound simulation object comprising a plurality of target sound characteristics (such as preset frequency and a plurality of target sound characteristics) for a user, and then collecting sound characteristics of sound produced by the user in a simulation process.
For example, a certain song may be used as a preset sound, and the standard intonation frequencies of the song in different frequency bands are respectively determined as target frequencies; by having the user mimic and sing the piece of song, the sound characteristics of the actual pronunciation of the user when mimicking the sounds of the plurality of target sound characteristics can be obtained.
S12, generating a voice optimization rule of the user, wherein the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic;
the application scenario of the present application includes a problem of improving the pronunciation capability of the user, for example, the user cannot make a high pitch exceeding a certain frequency or a low pitch lower than a certain frequency, or the user cannot make a sound pleasant enough due to excessive specific overtones included in the user's pronunciation.
The preset sound of the user comprises the sound characteristics such as frequency or overtone corresponding to the pronunciation problem; since the sound characteristics of the actual utterance corresponding to the sound characteristics desired by the user in the preset sound are generated while the user simulates the preset sound, the correspondence between the sound characteristics desired by the user and the sound characteristics of the actual utterance of the user can be obtained by measuring the utterance characteristics of the user. Then, the corresponding sound optimization rule can be generated. The sound optimization rule in the embodiment of the present invention may include: according to the specific difference value between the pronunciation characteristic and the target frequency of the user in different frequency bands, different adjustment values are respectively determined, so that the actual pronunciation frequency of the user is correspondingly improved or reduced. In addition, the corresponding correction value can be determined according to the difference value between the overtone characteristic and the actual sound in the user target sound characteristic.
The voice optimization rule in the embodiment of the present invention may be used to correct the voice frequency of the user, and may also be used to correct the voice intensity of the voice of the user and beautify the voice, and specifically, the voice optimization rule may also be used to adjust the pitch intensity of the voice uttered by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relationship between the voice characteristic of the user and the target voice characteristic, and adjust the overtone composition of the voice uttered by the user and/or the ratio of each overtone to the pitch to be consistent with the overtone composition in the corresponding target voice characteristic and/or the ratio of each overtone to the pitch to be consistent with the ratio of each fundamental tone.
In order to improve the effect of sound optimization, preferably, the method for generating the sound optimization rule in the embodiment of the present invention may specifically include: generating an independent variable according to the corresponding sound characteristic when a user imitates a preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound; generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform of preset sound; training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting the actual sound of the user into the target sound which the user wants to send through deep learning; wherein the artificial intelligence may specifically be a neural network.
In this way, based on the correspondence between the actual sound (the spectrogram after fourier transform is used as input data of the neural network) uttered by the user and the target sound (the spectrogram after fourier transform is used as target output data of the neural network in the same way), the training is performed by using a common artificial intelligence model such as a CNN multi-layer neural network, and a sound optimization model for converting the actual sound of the user into the target sound that the user desires to utter is generated through deep learning. By constructing the artificial intelligent sound optimization model, the comprehensive corresponding relation between the actual pronunciation of the user and the target sound expected to be made by the user can be determined more accurately and widely, and more accurate data basis is provided for subsequent sound correction (including correction of various sound characteristics such as fundamental tone and/or overtone and/or speech rate and/or accent).
In the application stage:
s13, acquiring the current sound characteristic of the user according to the audio data of the real-time sound production of the user;
the application stage in the embodiment of the present invention refers to a specific optimization process when a user needs to perform sound optimization after the preset stage is completed.
Also, taking the example of singing by the user, when the user sings a song by using a microphone (or microphone), the audio data of the singing voice of the user is acquired in real time, and then the current sound characteristic of the user can be acquired according to the audio data of the singing voice.
S14, correcting the audio data in real time according to the sound optimization rule by taking the current sound characteristic as a parameter;
when the current voice characteristics of the user are obtained, the ideal frequency, overtone and the like of the voice expected by the user can be obtained according to the voice optimization rule; in this way, the sound of the user can be modified in real time according to the sound optimization rule by taking the current sound characteristic as a parameter, so that the sound characteristic in the output audio data conforms to the sound expected by the user, namely, the sound effect of the modified audio data is the sound which is corrected the pronunciation defect of the user and conforms to the sound expected by the user. For example, according to the embodiment of the present invention, before audio data enters a sound device such as a sound system, the audio data of a user is subjected to real-time frequency correction according to the current frequency correction value. Therefore, the singing voice of the user sent by the sound equipment can be consistent with the expectation of the user, and the aims of compensating the targeted defect of the pronunciation of different users and beautifying the voice of the user in a personalized way are fulfilled.
In practical application, the tone of the user or the pitch intensity of the fundamental tone in the voice can be adjusted through the voice optimization rule; the adjustment of the sound intensity means that when the user cannot make some sound with normal sound intensity, the sound intensity of the user's sound can be corrected by the sound optimization rule.
In addition, in another optimization scheme in the embodiment of the present invention, the sound optimization rule may further include a modification to the user's timbre, so that the played user's sound is more pleasing by adjusting the composition of the overtones in the user's actual sound and/or the ratio of each overtone to the fundamental tone.
In summary, in the embodiment of the present invention, the sound generation characteristics of the user are first determined, so that the sound characteristics of the actual sound generated when the user imitates the preset sound can be obtained according to the difference of different sound generation abilities of the user; then, according to the corresponding relation between the preset sound and the sound characteristic of the actual sound of the user, a personalized sound optimization rule can be generated for the user; after the presetting, when a user needs to perform sound optimization, the current sound characteristic of the user is obtained through the real-time sounding audio data, then the current sound characteristic is taken as a parameter, and the sound characteristic expected by the user is judged through the sound optimization rule of the user; and then the sound of the user can be corrected in a personalized and real-time manner.
By the embodiment of the invention, when a user sings a song, the sound effect embodied by the singing effect of the user comprises the frequency which cannot be sung originally by the user through sound optimization, so that the requirement of personalized sound optimization of the user is met. It should be noted that, in the embodiment of the present invention, an application scenario of sound optimization is not limited to singing by a user, and may also include a sound optimization requirement for other sound expressions such as a speech of the user.
Furthermore, the sound optimization rule in the embodiment of the present invention may also be used to adjust the sound intensity of the actual sound of the user; therefore, the sound optimization and adjustment can be provided for the user in the labor-consuming pronunciation area of the user, so that the user can more easily send out the ideal sound effect.
Furthermore, the sound optimization rule in the embodiment of the present invention may also be used to adjust the composition of the overtones of the actual sound of the user, and/or the ratio of each overtone to the fundamental tone; the composition of the overtones and the proportion of each overtone to the fundamental tone can determine the tone of the pronunciation of the user; therefore, the embodiment of the invention can make the tone of the user more pleasant by adjusting the composition of the overtones of the actual voice of the user and/or the proportion of each overtone to the fundamental tone, thereby further improving the user experience of voice optimization.
In an embodiment of the present invention, there is further provided a memory including an instruction set, where the instruction set is suitable for a processor to execute the steps in the audio data processing method in the embodiment corresponding to fig. 1.
In an embodiment of the present invention, there is also provided an audio data processing apparatus, as shown in fig. 2, including a bus, a processor, and a memory; the bus is used for connecting the memory and the processor; the processor is configured to execute a set of instructions in the memory. Wherein the memory comprises a set of instructions adapted to the processor to carry out the steps of the audio data processing method according to the corresponding embodiment of fig. 1, and to achieve the same technical effect.
Fig. 2 is a schematic diagram of a hardware structure of an audio processing device for an acoustic unit as an electronic device according to an embodiment of the present invention, and as shown in fig. 2, the device includes one or more processors 610 and a memory 620. Take a processor 610 as an example. The processor 610 and the memory 620 may be connected by a bus or other means, such as by a bus 630 in FIG. 2.
The memory 620, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor 610 executes various functional applications and data processing of the electronic device, i.e., the processing method of the above-described method embodiment, by executing the non-transitory software programs, instructions and modules stored in the memory 620.
The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory 620 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 620 optionally includes memory located remotely from the processor 610, which may be connected to the processing device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 620 and, when executed by the one or more processors 610, perform:
in a preset stage:
measuring the sound production characteristics of a user, and acquiring the sound characteristics of the user corresponding to actual sound when the user imitates preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone;
generating a voice optimization rule of the user, wherein the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic;
in the application stage:
acquiring the current sound characteristic of the user according to the audio data of the real-time sound production of the user;
and correcting the audio data in real time according to the sound optimization rule by taking the current sound characteristic as a parameter.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
In the embodiment of the present invention, an audio data processing apparatus is further provided, as shown in fig. 3, including a characteristic obtaining module 01 and a modifying module 02;
the characteristic acquisition module 01 is used for measuring the sound production characteristics of a user and acquiring the sound characteristics of the user corresponding to actual sound when the user imitates preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone; the correction module 02 is used for correcting the audio data in real time according to a sound optimization rule by taking the current sound characteristic as a parameter; and the sound optimization rule is used for adjusting one or any combination of the sound characteristics of the sound emitted by the user to be consistent with the corresponding target sound characteristics according to the corresponding relation between the sound characteristics of the user and the target sound characteristics.
The voice optimization rule in the embodiment of the present invention may be used to correct the voice frequency of the user, and may also be used to correct the voice intensity of the voice of the user and beautify the voice, and specifically, the voice optimization rule may also be used to adjust the pitch intensity of the voice uttered by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relationship between the voice characteristic of the user and the target voice characteristic, and adjust the overtone composition of the voice uttered by the user and/or the ratio of each overtone to the pitch to be consistent with the overtone composition in the corresponding target voice characteristic and/or the ratio of each overtone to the pitch to be consistent with the ratio of each fundamental tone.
In order to improve the effect of sound optimization, preferably, the method for generating the sound optimization rule in the embodiment of the present invention may specifically include: generating an independent variable according to the corresponding sound characteristic when a user imitates a preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound; generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform of preset sound; training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting actual sound of a user into target sound expected to be emitted by the user through deep learning; wherein the artificial intelligence may specifically be a neural network.
In this way, based on the correspondence between the actual sound (the spectrogram after fourier transform is used as input data of the neural network) uttered by the user and the target sound (the spectrogram after fourier transform is used as target output data of the neural network in the same way), the training is performed by using a common artificial intelligence model such as a CNN multi-layer neural network, and a sound optimization model for converting the actual sound of the user into the target sound that the user desires to utter is generated through deep learning. By constructing the artificial intelligent sound optimization model, the comprehensive corresponding relation between the actual pronunciation of the user and the target sound expected to be made by the user can be determined more accurately and widely, and more accurate data basis is provided for subsequent sound correction (including correction of various sound characteristics such as fundamental tone and/or overtone and/or speech rate and/or accent).
The step of generating the sound optimization rule in the embodiment of the invention can be realized by a special sound generation characteristic measuring device, namely, the sound generation characteristic measuring device can collect sound data when a user performs sound generation characteristic measurement, so that the sound generation characteristic measurement of the user is realized; the method specifically comprises the following steps: measuring the sound production characteristics of a user, and acquiring the sound characteristics of the user corresponding to actual sound when the user imitates preset sound comprising a plurality of target sound characteristics;
the execution main body for generating the sound optimization rule of the user in the embodiment of the invention can be realized by a computer and other independent processing equipment; or by an adjunct processing component as an audio data processing apparatus; the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
The audio data processing apparatus in the practice of the invention may be a stand-alone device, which may be connected in series anywhere between the microphone and the sound output device,
when the audio data processing device is an independent device, the audio data processing device can also comprise an audio output module connected with the preset sound processing device and an audio input module connected with the sound pickup device;
the audio output module is used for transmitting the corrected audio data to a preset sound processing device; the audio roll-out module may be an audio plug adapted to the sound processing device.
And the audio transfer-in module is used for transmitting the audio data of the user obtained by the sound pickup equipment to the characteristic obtaining module.
The specific working principle and beneficial effects of the audio data processing apparatus in the embodiment of the present invention may refer to the embodiment of the audio data processing method corresponding to fig. 1, and are not described herein again.
In another aspect of the embodiments of the present invention, there is also provided a microphone, including an audio adapter plug, a sound pickup unit, and an audio data processing apparatus;
the characteristic acquisition module of the audio data processing device is connected with the pickup unit circuit and is used for acquiring audio data of a user; the audio adapter plug is adapted to a preset sound processing device and is used for transmitting the corrected audio data to the preset sound processing device.
The audio data processing device in the implementation of the invention can also be used as an accessory of sound pickup equipment such as a microphone, or can also be used as an accessory of sound processing equipment such as a sound box or a power amplifier. The sound processing equipment in the implementation of the invention comprises sound output equipment such as a sound box and an earphone, and can also comprise other sound processing equipment such as a power amplifier.
The specific working principle and beneficial effects of the microphone in the implementation of the present invention can refer to the embodiment of the audio data processing method corresponding to fig. 1, and are not described herein again.
In another aspect of the embodiments of the present invention, there is also provided an audio data processing system, including the audio data processing apparatus, the optimization rule generating apparatus, and the microphone in the above embodiments;
the optimization rule generating device in the embodiment of the invention is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
Specifically, the embodiment of the present invention may include three separate devices, namely, an audio data processing device, an optimization rule generating device, and a microphone; the audio data processing apparatus may refer to the embodiment shown in fig. 3, and the optimization rule generating apparatus may be implemented by a processing device such as an independent computer.
The specific working principle and beneficial effects of the audio data processing system in the embodiment of the present invention can refer to the embodiment of the audio data processing method corresponding to fig. 1, and the records in the embodiment of the audio data processing apparatus, the embodiment of the microphone, or the embodiment of the audio data processing device, which are not described herein again.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (16)

  1. A method of audio data processing, comprising:
    in a preset stage:
    measuring the sound production characteristics of a user, and acquiring the sound characteristics of the user corresponding to actual sound when the user imitates preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone;
    generating a voice optimization rule of the user, wherein the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic;
    in the application stage:
    acquiring the current sound characteristic of the user according to the audio data of the real-time sound production of the user;
    and correcting the audio data in real time according to the sound optimization rule by taking the current sound characteristic as a parameter.
  2. The audio data processing method of claim 1, wherein the sound optimization rules are further for:
    and adjusting the pitch intensity of the voice sent by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
  3. The audio data processing method as claimed in claim 2, wherein the sound optimization rules are further for:
    and according to the corresponding relation between the voice characteristic of the user and the target voice characteristic, adjusting the overtone composition of the voice sent by the user and/or the proportion of each overtone to the fundamental tone to be consistent with the overtone composition in the corresponding target voice characteristic and/or the proportion of each overtone to the fundamental tone.
  4. The audio data processing method of any one of claims 1 to 3, wherein the sound optimization rule generation method comprises:
    generating an independent variable according to the corresponding sound characteristic when a user imitates the preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound;
    generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform is carried out on the preset sound;
    and training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting the actual sound of the user into the target sound which the user wants to send through deep learning.
  5. The audio data processing method of claim 4, wherein the artificial intelligence comprises a neural network.
  6. A memory comprising a set of instructions adapted to be executed by a processor to perform the steps of the audio data processing method according to any one of claims 1 to 5.
  7. An audio data processing device comprising a bus, a processor and a memory as claimed in claim 6;
    the bus is used for connecting the memory and the processor;
    the processor is configured to execute a set of instructions in the memory.
  8. An audio data processing apparatus, comprising:
    the characteristic acquisition module is used for measuring the sound production characteristic of the user and acquiring the sound characteristic of the user corresponding to the actual sound when the user imitates the preset sound comprising a plurality of target sound characteristics; the voice characteristics comprise the frequency and the sound intensity of fundamental tone in user voice, the composition of overtones and the proportion of each overtone to the fundamental tone;
    the correction module is used for correcting the audio data in real time according to a sound optimization rule by taking the current sound characteristic as a parameter; and the voice optimization rule is used for adjusting the fundamental tone frequency of the voice sent by the user to be consistent with the fundamental tone frequency in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
  9. The audio data processing apparatus according to claim 8, further comprising an audio roll-out module connected to a preset sound output device, and/or an audio roll-in module connected to a sound pickup device;
    the audio output module is used for transmitting the corrected audio data to the preset sound processing equipment;
    the audio transfer-in module is used for transmitting the audio data of the user obtained by the sound pickup equipment to the characteristic obtaining module.
  10. The audio data processing apparatus according to claim 9, wherein the predetermined sound processing device comprises a speaker and/or a power amplifier; the sound pickup apparatus includes a microphone.
  11. The audio data processing apparatus of claim 8, wherein the sound optimization rule is further to:
    and adjusting the pitch intensity of the voice sent by the user to be consistent with the pitch intensity in the corresponding target voice characteristic according to the corresponding relation between the voice characteristic of the user and the target voice characteristic.
  12. The audio data processing apparatus as claimed in claim 8, wherein the sound optimization rules are further for:
    and according to the corresponding relation between the voice characteristic of the user and the target voice characteristic, adjusting the overtone composition of the voice sent by the user and/or the proportion of each overtone to the fundamental tone to be consistent with the overtone composition in the corresponding target voice characteristic and/or the proportion of each overtone to the fundamental tone.
  13. The audio data processing apparatus according to any one of claims 8 to 12, wherein the method of generating the sound optimization rule includes:
    generating an independent variable according to the corresponding sound characteristic when a user imitates the preset sound comprising a plurality of target sound characteristics; the independent variable comprises a spectrogram generated after Fourier transform is carried out on the actual sound;
    generating a target variable according to the target sound characteristic; the target variable comprises a spectrogram generated after Fourier transform is carried out on the preset sound;
    and training artificial intelligence according to the independent variable and the target variable, and generating a sound optimization model for converting the actual sound of the user into the target sound which the user wants to send through deep learning.
  14. The audio data processing apparatus as claimed in claim 13, wherein the artificial intelligence comprises a neural network.
  15. A microphone comprising an audio patch plug, a pickup unit, and the audio data processing apparatus according to claim 8;
    the characteristic acquisition module is connected with the pickup unit circuit and used for acquiring audio data of a user;
    the audio adapter plug is adapted to a preset sound processing device and is used for transmitting the corrected audio data to the preset sound processing device.
  16. An audio data processing system comprising the audio data processing apparatus according to any one of claims 8 to 14, optimization rule generating means, and a microphone;
    and the optimization rule generating device is used for adjusting one or any combination of the sound characteristics of the sound emitted by the user to be consistent with the corresponding target sound characteristics according to the corresponding relation between the sound characteristics of the user and the target sound characteristics.
CN201980096054.3A 2019-05-17 2019-05-17 Memory, microphone, audio data processing method, device, equipment and system Pending CN114223032A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087439 WO2020232578A1 (en) 2019-05-17 2019-05-17 Memory, microphone, audio data processing method and apparatus, and device and system

Publications (1)

Publication Number Publication Date
CN114223032A true CN114223032A (en) 2022-03-22

Family

ID=73459554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980096054.3A Pending CN114223032A (en) 2019-05-17 2019-05-17 Memory, microphone, audio data processing method, device, equipment and system

Country Status (2)

Country Link
CN (1) CN114223032A (en)
WO (1) WO2020232578A1 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3102335B2 (en) * 1996-01-18 2000-10-23 ヤマハ株式会社 Formant conversion device and karaoke device
JP2000352991A (en) * 1999-06-14 2000-12-19 Nippon Telegr & Teleph Corp <Ntt> Voice synthesizer with spectrum correction function
CN102881283B (en) * 2011-07-13 2014-05-28 三星电子(中国)研发中心 Method and system for processing voice
CN103531205B (en) * 2013-10-09 2016-08-31 常州工学院 The asymmetrical voice conversion method mapped based on deep neural network feature
CN104538011B (en) * 2014-10-30 2018-08-17 华为技术有限公司 A kind of tone adjusting method, device and terminal device
CN106997767A (en) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN107886963B (en) * 2017-11-03 2019-10-11 珠海格力电器股份有限公司 A kind of method, apparatus and electronic equipment of speech processes
CN109272975B (en) * 2018-08-14 2023-06-27 无锡冰河计算机科技发展有限公司 Automatic adjustment method and device for singing accompaniment and KTV jukebox

Also Published As

Publication number Publication date
WO2020232578A1 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
Szöke et al. Building and evaluation of a real room impulse response dataset
CN109246515B (en) A kind of intelligent earphone and method promoting personalized sound quality function
CN108053814B (en) Speech synthesis system and method for simulating singing voice of user
US20210375303A1 (en) Natural Ear
US10854182B1 (en) Singing assisting system, singing assisting method, and non-transitory computer-readable medium comprising instructions for executing the same
US11727949B2 (en) Methods and apparatus for reducing stuttering
EP3121808A3 (en) System and method of modeling characteristics of a musical instrument
CN111583894A (en) Method, device, terminal equipment and computer storage medium for correcting tone in real time
US11875777B2 (en) Information processing method, estimation model construction method, information processing device, and estimation model constructing device
CN113436606A (en) Original sound speech translation method
CN110475181B (en) Equipment configuration method, device, equipment and storage medium
US20210366454A1 (en) Sound signal synthesis method, neural network training method, and sound synthesizer
CN116798405B (en) Speech synthesis method, device, storage medium and electronic equipment
Liu et al. Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model
CN109741761B (en) Sound processing method and device
CN114223032A (en) Memory, microphone, audio data processing method, device, equipment and system
CN116156214A (en) Video tuning method and device, electronic equipment and storage medium
US20210350783A1 (en) Sound signal synthesis method, neural network training method, and sound synthesizer
CN113921007B (en) Method for improving far-field voice interaction performance and far-field voice interaction system
WO2020241641A1 (en) Generation model establishment method, generation model establishment system, program, and training data preparation method
JP6428256B2 (en) Audio processing device
CN102903367A (en) Method and device for balancing frequency response of off-line iterative sound playback system
CN112164387A (en) Audio synthesis method and device, electronic equipment and computer-readable storage medium
CN102682766A (en) Self-learning lover voice swapper
US11380345B2 (en) Real-time voice timbre style transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination