WO2020232578A1 - 存储器、麦克风、音频数据处理方法、装置、设备和系统 - Google Patents

存储器、麦克风、音频数据处理方法、装置、设备和系统 Download PDF

Info

Publication number
WO2020232578A1
WO2020232578A1 PCT/CN2019/087439 CN2019087439W WO2020232578A1 WO 2020232578 A1 WO2020232578 A1 WO 2020232578A1 CN 2019087439 W CN2019087439 W CN 2019087439W WO 2020232578 A1 WO2020232578 A1 WO 2020232578A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
user
voice
audio data
target
Prior art date
Application number
PCT/CN2019/087439
Other languages
English (en)
French (fr)
Inventor
徐俊丽
Original Assignee
Xu Junli
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xu Junli filed Critical Xu Junli
Priority to CN201980096054.3A priority Critical patent/CN114223032A/zh
Priority to PCT/CN2019/087439 priority patent/WO2020232578A1/zh
Publication of WO2020232578A1 publication Critical patent/WO2020232578A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present invention relates to the field of acoustics, and in particular, to memory, microphone, audio data processing methods, devices, equipment and systems.
  • karaoke input devices such as microphones (microphones)
  • people have higher and higher requirements for user experience when using acoustic input units.
  • the inventor found that the way to improve the acoustic input unit's sound effect for users by simply providing a reverberation effect (reverberation effector) has at least the following defects:
  • the purpose of the present invention is to provide a memory, a microphone, and an audio data processing method, device, equipment and system for the problem that the audio data processing in the prior art cannot specifically compensate for the user's pronunciation defects.
  • the present invention provides an audio data processing method, including:
  • the user’s vocalization characteristics are measured to obtain the sound characteristics corresponding to the actual sound when the user imitates a preset sound including multiple target sound characteristics;
  • the sound characteristics include the frequency, sound intensity and overtone components of the user’s voice, And the ratio of each overtone to the fundamental tone;
  • the audio data is corrected in real time according to the sound optimization rule.
  • the sound optimization rule is also used for:
  • the fundamental sound intensity of the voice uttered by the user is adjusted to be consistent with the fundamental sound intensity in the corresponding target sound characteristic.
  • the sound optimization rule is also used for:
  • the method for generating the sound optimization rule includes:
  • the independent variable includes a spectrogram generated after Fourier transform of the actual sound
  • the target variable includes a spectrogram generated after Fourier transforming the preset sound
  • the artificial intelligence includes a neural network.
  • a memory including an instruction set suitable for the processor to execute the steps in the audio data processing method described above.
  • an audio data processing device including a bus, a processor, and the foregoing memory;
  • the bus is used to connect the memory and the processor
  • the processor is used to execute the instruction set in the memory.
  • an audio data processing device including:
  • the characteristic acquisition module is used to measure the vocalization characteristics of the user, and acquire the sound characteristics corresponding to the actual sound when the user imitates a preset sound including multiple target sound characteristics;
  • the sound characteristics include the frequency and sound intensity of the fundamental sound in the user's voice , The composition of overtones, and the ratio of each overtone to the fundamental;
  • the correction module is used to modify the audio data in real time according to the sound optimization rule using the current sound characteristic as a parameter; the sound optimization rule is used for corresponding to the target sound characteristic according to the user's sound characteristic Relationship, adjust the pitch frequency of the user's voice to be consistent with the pitch frequency in the corresponding target sound characteristic.
  • it further includes an audio transfer module connected to the preset sound output device, and/or an audio transfer module connected to the sound pickup device;
  • the audio output module is used to transmit the corrected audio data to the preset sound processing device
  • the audio transfer module is used for transmitting the user's audio data obtained by the sound pickup device to the characteristic obtaining module.
  • the preset sound processing device includes a speaker and/or a power amplifier; and the sound pickup device includes a microphone.
  • the sound optimization rule is also used for:
  • the fundamental sound intensity of the voice uttered by the user is adjusted to be consistent with the fundamental sound intensity in the corresponding target sound characteristic.
  • the sound optimization rule is also used for:
  • the method for generating the sound optimization rule includes:
  • the independent variable includes a spectrogram generated after Fourier transform of the actual sound
  • the target variable includes a spectrogram generated after Fourier transforming the preset sound
  • the artificial intelligence includes a neural network. .
  • a microphone including an audio adapter plug, a pickup unit and the above audio data processing device;
  • the characteristic acquisition module is connected to the pickup unit circuit and is used to acquire audio data of the user;
  • the audio adapter plug is adapted to a preset sound processing device, and is used to transmit the corrected audio data to the preset sound processing device.
  • an audio data processing system including a microphone, an optimization rule generation device, and the above audio data processing device; the optimization rule generation device is used to compare the voice characteristics of the user with The corresponding relationship of the target sound characteristics adjusts one or any combination of the sound characteristics of the user's voice to be consistent with the corresponding target sound characteristics.
  • the user's vocalization characteristics are measured first, so that the user can obtain the sound characteristics of the actual vocalization of the user imitating the preset sound from time to time according to the difference in the user's vocalization ability at different sound frequencies; then, According to the corresponding relationship between the target sound characteristics of the preset sound and the sound characteristics of the user’s actual sound, personalized sound optimization rules can be generated for the user; after the above-mentioned preset, when the user needs to optimize the sound, the real-time sounded audio is used Data to obtain the user’s current voice characteristics, and then the current voice characteristics can be used as parameters, and the user’s voice optimization rules can be used to determine the voice characteristics expected by the user; and then the user’s voice can be personalized Modified in real time.
  • the user can optimize the sound when singing a song, so that the sound effect reflected by the user's singing effect can include the frequency or other sound characteristics that the user cannot sing originally, thereby satisfying the user Personalized sound optimization needs.
  • the application scenario of sound optimization is not only limited to the user's singing, but may also include voice optimization requirements for the user's speech and other voice expressions.
  • the sound optimization rules in the embodiments of the present invention can also be used to adjust the sound intensity of the user's actual voice; thus, it can provide the user with sound optimization and adjustment in the laborious pronunciation area, so that the user can more easily express the ideal Sound effects.
  • the sound optimization rules in the embodiment of the present invention can also be used to adjust the components of the overtones of the user's actual voice, and/or the ratio of each overtone to the fundamental; the component of overtones and the ratio of each overtone to the fundamental can be Determine the timbre of the user’s pronunciation; therefore, in the embodiment of the present invention, by adjusting the components of the overtones of the user’s actual voice, and/or the ratio of each overtone to the fundamental tone, the user’s timbre can be made more pleasing and thus further improved Sound optimized user experience.
  • the method of generating the sound optimization rule may specifically be: according to the actual sound emitted by the user (using the Fourier-transformed spectrogram as the input data of the neural network) and the target sound (the same Fu The converted spectrogram of the inner leaf is used as the target output data of the neural network).
  • the general artificial intelligence model such as CNN multi-layer neural network is used for training, and the sound optimization model is generated through deep learning.
  • the sound optimization model is used to use the user The actual sound is converted to the target sound the user wants to make.
  • an artificial intelligence sound optimization model By constructing an artificial intelligence sound optimization model, it is possible to more accurately and extensively determine the overall correspondence between the user’s actual pronunciation and the target sound that the user wants to emit, and then to correct the subsequent sound (including the fundamental and/or overtone and / Or correction of various voice features such as speech speed and/or accent) provide more accurate data basis.
  • FIG. 1 is a schematic flowchart of an audio data processing method in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of the hardware structure of an audio data processing device in an embodiment of the present invention.
  • Figure 3 is a schematic structural diagram of an audio data processing device in an embodiment of the present invention.
  • the acoustic input unit in the embodiment of the present invention may be a karaoke input device such as a microphone (microphone), or may be another human voice collection unit for audio equipment.
  • an embodiment of the present invention provides an audio data processing method, as shown in FIG. 1, including:
  • the sound characteristics include the frequency and sound intensity of the fundamental sound in the user's voice, and multiple target sound characteristics , And the ratio of each overtone to the fundamental tone;
  • the measurement of the sound characteristics in the embodiment of the present invention includes providing a user with a preset sound imitation object that includes multiple target sound characteristics (such as a preset frequency and multiple target sound characteristics), and then collecting the user's voice during the imitation process.
  • Sound characteristics are the preset sound including specific sound frequencies and overtones.
  • a certain song can be used as the preset sound, and the frequency of the standard pitch of this song in different frequency bands can be determined as the target frequency; by allowing the user to imitate and sing the song, it is possible to obtain the user’s imitating The sound characteristic of the actual pronunciation when the sound of the target sound characteristic.
  • the voice optimization rule is used to adjust the pitch frequency of the user's voice to be consistent with the pitch frequency in the corresponding target sound characteristic according to the correspondence between the user's voice characteristic and the target sound characteristic;
  • the application scenarios of this application include problems that are used to improve the user’s own pronunciation ability, such as the user’s inability to utter a treble that exceeds a certain frequency or a bass that is below a certain frequency, or because the user’s utterance includes too many specific overtones Causes problems such as insufficient sound.
  • the user’s preset sound will include sound characteristics such as frequency or overtone corresponding to the pronunciation problem; in the process of the user imitating the preset sound, the actual pronunciation corresponding to the sound characteristics expected by the user in the preset sound will be generated Sound characteristics, therefore, by measuring the utterance characteristics of the user, the correspondence between the sound characteristics expected by the user and the sound characteristics actually spoken by the user can be obtained. Then, the corresponding sound optimization rules can be generated.
  • the sound optimization rule in the embodiment of the present invention may include: determining different adjustment values according to the specific difference between the user’s pronunciation characteristics and the target frequency in different frequency bands, so as to correspondingly increase the frequency of the user’s actual pronunciation or Is to reduce the frequency of the user's actual pronunciation.
  • the corresponding correction value can also be determined according to the difference between the overtone characteristic in the user's target sound characteristic and the actual sound.
  • the sound optimization rules in the embodiments of the present invention can not only be used to correct the user's voice frequency, but also can be used to correct the user's sound intensity and beautify the sound.
  • the sound optimization rules can also be used to The corresponding relationship between the user’s voice characteristics and the target voice characteristics, adjust the fundamental sound intensity of the user’s voice to be consistent with the fundamental sound strength in the corresponding target voice characteristics, and according to the correspondence between the user’s voice characteristics and the target voice characteristics
  • the relationship is to adjust the overtone composition of the voice uttered by the user, and/or the ratio of each overtone to the fundamental tone, to be adjusted to correspond to the overtone composition of the target sound characteristic, and/or the ratio of each overtone to the fundamental tone is consistent.
  • the method for generating sound optimization rules in the embodiment of the present invention may specifically include: generating an independent variable according to the corresponding sound characteristics when the user imitates a preset sound including multiple target sound characteristics;
  • the independent variable includes the spectrogram generated after Fourier transform of the actual sound;
  • the target variable is generated according to the target sound characteristics;
  • the target variable includes the spectrogram generated after the Fourier transform of the preset sound; according to the independent variable and the target Variables are used for artificial intelligence training, and deep learning is used to generate a sound optimization model for converting the user's actual voice into the target voice that the user wishes to emit;
  • the artificial intelligence may specifically be a neural network.
  • an artificial intelligence sound optimization model By constructing an artificial intelligence sound optimization model, it is possible to more accurately and widely determine the overall correspondence between the user’s actual pronunciation and the target sound that the user wants to emit, and then to correct the subsequent sound (including the fundamental and/or overtone and/ Or the correction of various voice characteristics such as speech speed and/or accent) to provide more accurate data basis.
  • the application stage in the embodiment of the present invention refers to a specific optimization process when the user needs to perform sound optimization after the foregoing preset stage is completed.
  • the audio data of the user's singing voice is obtained in real time, and then the user's current voice characteristics can be obtained according to the audio data of the singing voice.
  • the ideal frequency and overtones of the sound the user expects can be known; in this way, using the current sound characteristics as parameters and according to the sound optimization rules, the user’s voice can be real-time
  • the correction to make the sound characteristics in the output audio data meet the user's expected sound that is, the sound effect of the corrected audio data is to correct the user's pronunciation defect and meet the user's expected sound.
  • the user's audio data is corrected in real time according to the current frequency correction value. In this way, the user's singing voice emitted by the audio device can be consistent with the user's expectation, thereby achieving the purpose of making up for the targeted defects of the pronunciation of different users and beautifying the user's voice in a personalized manner.
  • the adjustment of the sound intensity means that when the user cannot produce certain sounds with normal sound intensity, The sound intensity of the user's voice can be corrected through the sound optimization rule.
  • the sound optimization rule may also include the correction of the user's timbre, so as to adjust the component of the overtone in the user's actual voice, and/or the ratio of each overtone to the fundamental tone. , Which makes the user's voice more pleasant.
  • the user's vocalization characteristics are measured first, so that the actual vocal characteristics of the user's voice when imitating the preset sound can be obtained according to the difference in the user's different vocalization ability; then, according to the preset
  • the corresponding relationship between the voice and the voice characteristics of the user’s actual voice can generate personalized voice optimization rules for the user; after the above preset, when the user needs to optimize the voice, the user’s current voice data can be obtained through the real-time voice data
  • the voice characteristics can then use the current voice characteristics as parameters, and the user’s voice optimization rules can be used to determine the voice characteristics expected by the user; thus, the user’s voice can be personalized and corrected in real time.
  • the user can optimize the sound when singing a song, so that the sound effects reflected by the user's singing effect include frequencies that the user cannot sing originally, so as to meet the user's personalized sound optimization. demand.
  • the application scenario of sound optimization is not only limited to the user's singing, but may also include voice optimization requirements for the user's speech and other voice expressions.
  • the sound optimization rules in the embodiments of the present invention can also be used to adjust the sound intensity of the user's actual voice; thus, it can provide the user with sound optimization and adjustment in the laborious pronunciation area, so that the user can more easily express the ideal Sound effects.
  • the sound optimization rules in the embodiment of the present invention can also be used to adjust the components of the overtones of the user's actual voice, and/or the ratio of each overtone to the fundamental; the component of overtones and the ratio of each overtone to the fundamental can be Determine the timbre of the user’s pronunciation; therefore, in the embodiment of the present invention, by adjusting the components of the overtones of the user’s actual voice, and/or the ratio of each overtone to the fundamental tone, the user’s timbre can be made more pleasing and thus further improved Sound optimized user experience.
  • a memory including an instruction set suitable for the processor to execute the steps in the audio data processing method in the embodiment corresponding to FIG. 1.
  • an audio data processing device is also provided, as shown in FIG. 2, which includes a bus, a processor, and a memory; the bus is used to connect the memory and the processor; and the processor is used to execute an instruction set in the memory.
  • the memory includes an instruction set, and the instruction set is suitable for the processor to execute the steps in the audio data processing method in the embodiment corresponding to FIG. 1 and achieve the same technical effect.
  • FIG. 2 is a schematic diagram of the hardware structure of an audio processing device for an acoustic unit used as an electronic device in an embodiment of the present invention.
  • the device includes one or more processors 610 and a memory 620. Take a processor 610 as an example.
  • the processor 610 and the memory 620 may be connected through a bus or in other ways. In FIG. 2, the connection through the bus 630 is taken as an example.
  • the memory 620 can be used to store non-transitory software programs, non-transitory computer executable programs, and modules.
  • the processor 610 executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory 620, that is, realizing the processing methods of the foregoing method embodiments.
  • the memory 620 may include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data and the like.
  • the memory 620 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 620 may optionally include memories remotely provided with respect to the processor 610, and these remote memories may be connected to the processing device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 620, and when executed by the one or more processors 610, execute:
  • the user’s vocalization characteristics are measured to obtain the sound characteristics corresponding to the actual sound when the user imitates a preset sound including multiple target sound characteristics;
  • the sound characteristics include the frequency, sound intensity and overtone components of the user’s voice, And the ratio of each overtone to the fundamental tone;
  • the audio data is corrected in real time according to the sound optimization rule.
  • an audio data processing device is also provided, as shown in FIG. 3, which includes a characteristic acquisition module 01 and a correction module 02;
  • the characteristic acquisition module 01 is used to measure the vocalization characteristics of the user, and acquire the sound characteristics corresponding to the actual sound when the user imitates a preset sound including multiple target sound characteristics; the sound characteristics include the frequency and sound intensity of the fundamental sound in the user's voice , The components of the overtones, and the ratio of each overtone to the fundamental; the correction module 02 is used to make real-time corrections to the audio data according to the sound optimization rules using the current sound characteristics as parameters; the sound optimization rules are used to modify the audio data according to the sound optimization rules. According to the corresponding relationship between the user's voice characteristics and the target voice characteristics, one or any combination of the voice characteristics of the user's voice is adjusted to be consistent with the corresponding target voice characteristics.
  • the sound optimization rules in the embodiments of the present invention can not only be used to correct the user's voice frequency, but also can be used to correct the user's sound intensity and beautify the sound.
  • the sound optimization rules can also be used to The corresponding relationship between the user’s voice characteristics and the target voice characteristics, adjust the fundamental sound intensity of the user’s voice to be consistent with the fundamental sound strength in the corresponding target voice characteristics, and according to the correspondence between the user’s voice characteristics and the target voice characteristics
  • the relationship is to adjust the overtone composition of the voice uttered by the user, and/or the ratio of each overtone to the fundamental tone, to be adjusted to correspond to the overtone composition of the target sound characteristic, and/or the ratio of each overtone to the fundamental tone is consistent.
  • the method for generating sound optimization rules in the embodiment of the present invention may specifically include: generating an independent variable according to the corresponding sound characteristics when the user imitates a preset sound including multiple target sound characteristics;
  • the independent variable includes the spectrogram generated after Fourier transform of the actual sound;
  • the target variable is generated according to the target sound characteristics;
  • the target variable includes the spectrogram generated after the Fourier transform of the preset sound; according to the independent variable and the target Variables are used for artificial intelligence training, and deep learning is used to generate a sound optimization model for converting the user's actual voice into the target voice that the user wishes to emit;
  • the artificial intelligence may specifically be a neural network.
  • an artificial intelligence sound optimization model By constructing an artificial intelligence sound optimization model, it is possible to more accurately and widely determine the overall correspondence between the user’s actual pronunciation and the target sound that the user wants to emit, and then to correct the subsequent sound (including the fundamental and/or overtone and/ Or the correction of various voice characteristics such as speech speed and/or accent) to provide more accurate data basis.
  • the generating step of the sound optimization rule in the embodiment of the present invention can be realized by a proprietary vocal characteristic measuring device, that is, the vocal characteristic measuring device can collect the sound data of the user when the vocal characteristic is measured, so as to realize the vocalization of the user.
  • Characteristic measurement specifically includes: measuring the user's vocalization characteristics to obtain the sound characteristics corresponding to the actual sound when the user imitates a preset sound that includes multiple target sound characteristics;
  • the executive body that generates the user's voice optimization rule can be implemented by a separate processing device such as a computer; it can also be implemented by an accessory processing component as an audio data processing device; the voice optimization rule is used to adjust the user's voice characteristics The corresponding relationship with the target sound characteristic, adjust the pitch frequency of the user's voice to be consistent with the pitch frequency of the corresponding target sound characteristic.
  • the audio data processing device in the implementation of the present invention can be an independent device, which can be connected in series at any position between the microphone and the sound output device,
  • the audio data processing device may also include an audio transfer-out module connected to the preset sound processing device and an audio transfer-in module connected to the pickup device;
  • the audio output module is used to transmit the corrected audio data to the preset sound processing device; the audio output module may be an audio plug adapted to the sound processing device.
  • the audio transfer module is used to transmit the user's audio data obtained by the sound pickup device to the characteristic obtaining module.
  • a microphone including an audio adapter plug, a pickup unit and an audio data processing device;
  • the characteristic acquisition module of the audio data processing device is connected to the pickup unit circuit for acquiring the user's audio data; the audio adapter plug is adapted to the preset sound processing equipment and is used to transmit the corrected audio data to The preset sound processing device.
  • the audio data processing device in the implementation of the present invention can also be used as an accessory device of a sound pickup device such as a microphone, or can also be used as an accessory device of a sound processing device such as a speaker or a power amplifier.
  • the sound processing equipment in the implementation of the present invention includes sound output equipment such as speakers and earphones, and may also include other sound processing equipment such as power amplifiers.
  • an audio data processing system including the audio data processing device, the optimization rule generation device and the microphone in the above embodiment;
  • the optimization rule generation device in the embodiment of the present invention is used to adjust the pitch frequency of the user's voice to be consistent with the pitch frequency in the corresponding target sound characteristic according to the correspondence between the user's voice characteristic and the target sound characteristic.
  • the embodiment of the present invention may include three separate devices: an audio data processing device, an optimization rule generation device, and a microphone; among them, the audio data processing device may refer to the corresponding embodiment in FIG. 3, and the optimization rule generation device may Realized by independent computer and other processing equipment.
  • a person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明涉及存储器、麦克风、音频数据处理方法、装置、设备和系统,其中所述方法包括,对用户进行发声特性测定,获取用户在模仿预设声音时对应的声音特性;根据用户的声音特性与目标声音的对应关系,生成用户的声音优化规则;根据用户实时发声的音频数据,获取用户的当前声音特性;以当前声音特性为参数,根据声音优化规则确定当前声音修正值;根据当前声音修正值对音频数据进行实时的声音修正。本申请可以使用户的演唱效果所体现出来的音效中,包括用户原本无法唱出的频率、音色、口音等声音特征,从而满足用户个性化的声音优化的需求。此外,本申请的声音优化的应用场景还可以包括用于用户的演说等其他的声音表达时的声音优化需求。

Description

存储器、麦克风、音频数据处理方法、装置、设备和系统 技术领域
本发明涉及声学领域,具体地,涉及存储器、麦克风、音频数据处理方法、装置、设备和系统。
背景技术
随着麦克风(话筒)等卡拉OK输入设备的普及,人们对于声学输入单元使用时的用户体验的需求标准也越来越高。
现有技术中,为了提高声学输入单元的现场听觉效果,一直在对话筒输入的声音进行“美化”方面进行着持续的改进和优化,比如:最常见的是通过使用提供混响效果(reverb/echo),来让唱歌者的音色听上去更饱满;所述混响效果的功能和效果可参考K歌房常用的那种。
发明人经过研究发现,通过单纯提供混响效果(混响效果器)等方式来改进声学输入单元的对于用户声音效果的方式至少存在以下缺陷:
单纯使用混响效果来对用户的声音进行处理方式,并不能针对性的弥补每个用户的发音缺陷,从而无法同时满足不同用户对于“美化”自己发音效果的需求。
发明内容
本发明的目的是针对现有技术中的音频数据处理不能针对性的弥补用户的发音缺陷的问题,提供了存储器、麦克风、音频数据处理方法、装置、设备和系统。
为了实现上述目的,根据本发明的一个方面,本发明提供了一种音频数据处理方法,包括:
在预设阶段:
对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;
生成所述用户的声音优化规则,所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致;
在应用阶段:
根据所述用户实时发声的音频数据,获取所述用户的当前声音特性;
以所述当前声音特性为参数,根据所述声音优化规则对音频数据进行实时的修正。
优选的,在本发明实施例中,所述声音优化规则还用于:
根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致。
优选的,在本发明实施例中,所述声音优化规则还用于:
根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
优选的,在本发明实施例中,所述声音优化规则的生成方法,包括:
根据用户在模仿包括有多个所述目标声音特性的预设声音时对应的声音特性生成自变量;所述自变量包括对所述实际声音进行傅里叶转换后生成的声谱图;
根据所述目标声音特性生成目标变量;所述目标变量包括对所述预设声音进行傅里叶转换后生成的声谱图;
根据所述自变量和所述目标变量进行人工智能的训练,通过深度学习生成用于将所述用户的实际声音转换为用户希望发出的目标声音的声音优化模型。
优选的,在本发明实施例中,所述人工智能包括神经网络。
在本发明实施例的另一面,还提供了一种存储器,包括指令集,所述指令集适于处理器执行上述所述音频数据处理方法中的步骤。
在本发明实施例的另一面,还提供了一种音频数据处理设备,包括总线、处理器和上述存储器;
所述总线用于连接所述存储器和所述处理器;
所述处理器用于执行所述存储器中的指令集。
在本发明实施例的另一面,还提供了一种音频数据处理装置,包括:
特性获取模块,用于对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;
修正模块,用于以所述当前声音特性为参数,根据声音优化规则对所述音频数据进行实时的修正;所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致。
优选的,在本发明实施例中,还包括与预设声音输出设备连接的音频转出模块,和/或,与拾音设备连接的音频转入模块;
所述音频转出模块用于将修正后的所述音频数据传输至所述预设声音处理设备;
所述音频转入模块用于将所述拾音设备所获得的用户的音频数据传输至所述特性获取模块。
优选的,在本发明实施例中,所述预设声音处理设备包括音箱和/或功放;所述拾音设备包括麦克风。
优选的,在本发明实施例中,所述声音优化规则还用于:
根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致。
优选的,在本发明实施例中,所述声音优化规则还用于:
根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
优选的,在本发明实施例中,其特征在于,所述声音优化规则的生成方法,包括:
根据用户在模仿包括有多个所述目标声音特性的预设声音时对应的声音特性生成自变量;所述自变量包括对所述实际声音进行傅里叶转换后生成的声谱图;
根据所述目标声音特性生成目标变量;所述目标变量包括对所述预设声音进行傅里叶转换后生成的声谱图;
根据所述自变量和所述目标变量进行人工智能的训练,通过深度学习生成用于将所述用户的实际声音转换为用户希望发出的目标声音的声音优化模型。
优选的,在本发明实施例中,所述人工智能包括神经网络。。
在本发明实施例的另一面,还提供了一种麦克风,包括音频转接插头、拾音单元和上述的音频数据处理装置;
所述特性获取模块与所述拾音单元电路连接,用于获取用户的音频数据;
所述音频转接插头与预设声音处理设备适配,用于将修正后的所述音频数据传输至所述预设声音处理设备。
在本发明实施例的另一面,还提供了一种音频数据处理系统,包 括麦克风、优化规则生成装置和上述的音频数据处理装;所述优化规则生成装置用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的声音特性中的一种或其任意组合调整至与对应的目标声音特性一致。
有益效果
综上所述,本发明实施例中,首先对用户进行发声特性测定,从而可以根据用户在不同声音频率上发音能力的差异,来获得用户模仿预设声音不时的实际发声的声音特性;然后,根据预设声音的目标声音特性和用户实际发声的声音特性的对应关系,可以为用户生成个性化的声音优化规则;经过上述预设后,当用户需要进行声音优化的时候,通过实时发声的音频数据,来获得用户的当前声音特性,然后就可以当前声音特性为参数,通过该用户的声音优化规则来判断出该用户所期待的发声的声音特性了;进而也就可以将用户的声音进行个性化的且实时的修正。
通过本发明实施例,用户可以在演唱歌曲的时候,通过声音的优化,来使用户的演唱效果所体现出来的音效中,能够包括用户原本无法唱出的频率或是其他声音特性,从而满足用户个性化的声音优化的需求。需要说明的是,本发明实施例中,声音优化的应用场景不仅仅限于用户唱歌,还可以包括用于用户的演说等其他的声音表达时的声音优化需求。
进一步的,本发明实施例中的声音优化规则还可以用于调整用户实际声音的声强;从而可以在用户费力的发音区域为用户提供声音的优化和调整,以使用户能够更加容易的发出理想的声音效果。
进一步的,本发明实施例中的声音优化规则还可以用于调整用户实际声音的泛音的组成成分,和/或,各个泛音与基音的比例;泛音的组成成分以及各个泛音与基音的比例,可以决定用户发音的音色;因此,本发明实施例通过调整用户实际声音的泛音的组成成分,和/或,各个泛音与基音的比例,可以是用户的音色更加的悦耳,进而也就进一步的提高了声音优化的用户体验。
进一步的,在本发明实施例中,声音优化规则的生成方式具体可以是:根据用户发出的实际声音(以傅里叶转换后的声谱图作为神经网络的输入数据)与目标声音(同样傅里叶转换后的声谱图作为神经网络的目标输出数据)的对应关系,利用CNN多层神经网络等通用人工智能模型进行训练,通过深度学习生成声音优化模型,该声音优化模型用于将用户的实际声音转换为用户希望发出的目标声音。通过构建人工智能声音优化模型的方式,可以能够更加精确、广泛的来确定用户的实际发音和用户希望发出的目标声音的全面对应关系,进而为后续的声音修正(包括对基音和/或泛音和/或语速和/或口音等各种声音特征的修正)提供更加精确的数据依据。
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
附图说明
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:
图1为本发明实施例中音频数据处理方法的流程示意图;
图2为本发明实施例中音频数据处理设备的硬件结构示意图;
图3为本发明实施例中音频数据处理装置的结构示意图。
具体实施方式
以下结合附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。
本发明实施例中的声学输入单元可以麦克风(话筒)等卡拉OK输入设备,也可以是其他用于音频设备的人声采集单元。
发明人经过研究发现,现有的混响效果技术,虽然能够使用户的声音能够更加的饱满,但是当用户对于某些频率区域的发音能力有缺陷时,混响效果技术就无法进行相应的修正了。
为了能够针对性的弥补用户的发音缺陷的问题,本发明实施例提供了一种音频数据处理方法,如图1所示,包括:
在预设阶段:
S11、对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;声音特性包括用户声音中基音的频率和声强、多个目标声音特性,以及各个泛音与基音的比例;
本发明实施例中的发声特性测定包括,为用户提供包括有多个目标声音特性(如预设频率、多个目标声音特性的预设声音模仿对象,然后采集用户在模仿过程中所发声音的声音特性。目标声音特性是预设声音中包括有特定的声音频率和泛音。
举例来说,可以以某段歌曲为预设声音,并分别将这段歌曲在不同频段的标准音准的频率确定为目标频率;通过让用户模仿和试唱该段歌曲,可以获得用户在模仿多个目标声音特性的声音时的实际发音的声音特性。
S12、生成用户的声音优化规则,所述声音优化规则用于根据用户的声音特性与目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致;
本申请的应用场景包括用于改善用户自身的发音能力问题,如,用户无法发出超过一定频率的高音或是低于一定频率的低音,或是,由于用户的发声中包括了过多的特定泛音导致声音不够悦耳等问题。
用户的预设声音中会包括有其发音问题所对应的频率或泛音等声音特性;在用户模仿预设声音的过程中,会产生与预设声音中用户期望的声音特性所对应的实际发音的声音特性,因此,通过对用户进行发声特性测定,可以获取用户期待的声音特性和该用户实际发音的声音特性之间的对应关系。接着,就可以生成相应的声音优化规则了。 本发明实施例中的声音优化规则可以包括:根据用户在不同的频段其发音特性与目标频率之间的具体差异值,来分别确定不同的调整值,来相应的提高用户的实际发音的频率或是降低用户的实际发音的频率。此外,还可以根据用户目标声音特性中的泛音特性的和实际声音的差值,来确定相应的修正值。
本发明实施例中的声音优化规则在除了可以用于修正用户的声音频率以外,还可以用于对用户声音进行声强的修正和声音的美化,具体来说,声音优化规则还可以用于根据用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致,以及,根据用户的声音特性与目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
为了提高声音优化的效果,优选的,本发明实施例中的声音优化规则的生成方法具体可以包括:根据用户在模仿包括有多个目标声音特性的预设声音时对应的声音特性生成自变量;自变量包括对实际声音进行傅里叶转换后生成的声谱图;根据目标声音特性生成目标变量;目标变量包括对预设声音进行傅里叶转换后生成的声谱图;根据自变量和目标变量进行人工智能的训练,通过深度学习生成用于将用户的实际声音转换为用户希望发出的目标声音的声音优化模型;其中人工智能具体可以是神经网络。
这样,根据用户发出的实际声音(以傅里叶转换后的声谱图作为神经网络的输入数据)与目标声音(同样傅里叶转换后的声谱图作为神经网络的目标输出数据)的对应关系,利用CNN多层神经网络等通用人工智能模型进行训练,通过深度学习生成声音优化模型,该声音优化模型用于将用户的实际声音转换为用户希望发出的目标声音。通过构建人工智能声音优化模型的方式,可以能够更加精确、广泛的确定用户的实际发音和用户希望发出的目标声音的全面对应关系,进而为后续的声音修正(包括对基音和/或泛音和/或语速和/或口音等各种声音特征的修正)提供更加精确的数据依据。
在应用阶段:
S13、根据用户实时发声的音频数据,获取用户的当前声音特性;
本发明实施例中的应用阶段是指在上述预设阶段完成后,用户需要进行声音优化时的具体优化过程。
还是以用户唱歌为例,用户使用麦克风(或话筒)唱歌时,实时的获取该用户歌声的音频数据,然后根据歌声的音频数据可以获取用户的当前声音特性。
S14、以当前声音特性为参数,根据声音优化规则对音频数据进行实时的修正;
在获得了用户的当前声音特性,根据声音优化规则,可以得知用户所期待发出声音的理想频率和泛音等;这样,以当前声音特性为参数,根据声音优化规则,可以对用户的声音进行实时的修正,以使输出的音频数据中的声音特性符合用户所期待声音,即,修正后的音频数据的声音效果是修正了用户发音缺陷,且符合用户期待的声音。比如,通过本发明实施例,在音频数据进入音响等发音设备之前,根据当前频率修正值对用户的音频数据进行实时的频率修正。从而可以使音响设备所发出的用户歌声与用户期待保持一致,从而实现了对不同用户的发音进行针对性的缺陷弥补,个性化的美化用户声音的目的。
在实际应用中,还可以通过声音优化规则,来调整用户的音色或是声音中基音的音强;其中,对于音强的调整是指,当用户无法以正常的音强发出某些声音时,通过声音优化规可以对用户声音的音强进行修正。
此外,本发明实施例中的又一优化方案中,声音优化规则还可以包括对于用户音色的修正,从而通过对用户实际声音中泛音的组成成分,和/或,各个泛音与基音的比例进行调整,从而使得播放出来的用户声音更加的悦耳。
综上所述,本发明实施例中,首先对用户进行发声特性测定,从而可以根据用户不同的发音能力的差异,来获得用户模仿预设声音时的实际发声的声音特性;然后,根据预设声音和用户实际发声的声音 特性的对应关系,可以为用户生成个性化的声音优化规则;经过上述预设后,当用户需要进行声音优化的时候,通过实时发声的音频数据,来获得用户的当前声音特性,然后就可以当前声音特性为参数,通过该用户的声音优化规则来判断出该用户所期待的发声的声音特性;进而也就可以将用户的声音进行个性化的且实时的修正。
通过本发明实施例,用户可以在演唱歌曲的时候,通过声音的优化,来使用户的演唱效果所体现出来的音效中,包括用户原本无法唱出的频率,从而满足用户个性化的声音优化的需求。需要说明的是,本发明实施例中,声音优化的应用场景不仅仅限于用户唱歌,还可以包括用于用户的演说等其他的声音表达时的声音优化需求。
进一步的,本发明实施例中的声音优化规则还可以用于调整用户实际声音的声强;从而可以在用户费力的发音区域为用户提供声音的优化和调整,以使用户能够更加容易的发出理想的声音效果。
进一步的,本发明实施例中的声音优化规则还可以用于调整用户实际声音的泛音的组成成分,和/或,各个泛音与基音的比例;泛音的组成成分以及各个泛音与基音的比例,可以决定用户发音的音色;因此,本发明实施例通过调整用户实际声音的泛音的组成成分,和/或,各个泛音与基音的比例,可以是用户的音色更加的悦耳,进而也就进一步的提高了声音优化的用户体验。
在本发明实施例中,还提供了一种存储器,包括指令集,所述指令集适于处理器执行如图1所对应实施例中音频数据处理方法中的步骤。
在本发明实施例中,还提供了一种音频数据处理设备,如图2所示,包括总线、处理器和存储器;总线用于连接存储器和处理器;处理器用于执行存储器中的指令集。其中存储器包括有指令集,所述指令集适于处理器执行如图1所对应实施例中音频数据处理方法中的步骤,,并实现相同的技术效果。
图2是本发明实施例作为电子设备的用于声学单元的音频处理设备的硬件结构示意图,如图2所示,该设备包括一个或多个处理器 610以及存储器620。以一个处理器610为例。处理器610和存储器620可以通过总线或者其他方式连接,图2中以通过总线630连接为例。
存储器620作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块。处理器610通过运行存储在存储器620中的非暂态软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例的处理方法。
存储器620可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储数据等。此外,存储器620可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器620可选包括相对于处理器610远程设置的存储器,这些远程存储器可以通过网络连接至处理装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器620中,当被所述一个或者多个处理器610执行时,执行:
在预设阶段:
对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;
生成所述用户的声音优化规则,所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致;
在应用阶段:
根据所述用户实时发声的音频数据,获取所述用户的当前声音特性;
以所述当前声音特性为参数,根据所述声音优化规则对音频数据进行实时的修正。
上述产品可执行本发明实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本发明实施例所提供的方法。
在本发明实施例中,还提供了一种音频数据处理装置,如图3所示,包括特性获取模块01和修正模块02;
特性获取模块01用于对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;修正模块02用于以所述当前声音特性为参数,根据声音优化规则所述音频数据进行实时的修正;所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的声音特性中的一种或其任意组合调整至与对应的目标声音特性一致。
本发明实施例中的声音优化规则在除了可以用于修正用户的声音频率以外,还可以用于对用户声音进行声强的修正和声音的美化,具体来说,声音优化规则还可以用于根据用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致,以及,根据用户的声音特性与目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
为了提高声音优化的效果,优选的,本发明实施例中的声音优化规则的生成方法具体可以包括:根据用户在模仿包括有多个目标声音特性的预设声音时对应的声音特性生成自变量;自变量包括对实际声音进行傅里叶转换后生成的声谱图;根据目标声音特性生成目标变量;目标变量包括对预设声音进行傅里叶转换后生成的声谱图;根据自变量和目标变量进行人工智能的训练,通过深度学习生成用于将用 户的实际声音转换为用户希望发出的目标声音的声音优化模型;其中人工智能具体可以是神经网络。
这样,根据用户发出的实际声音(以傅里叶转换后的声谱图作为神经网络的输入数据)与目标声音(同样傅里叶转换后的声谱图作为神经网络的目标输出数据)的对应关系,利用CNN多层神经网络等通用人工智能模型进行训练,通过深度学习生成声音优化模型,该声音优化模型用于将用户的实际声音转换为用户希望发出的目标声音。通过构建人工智能声音优化模型的方式,可以能够更加精确、广泛的确定用户的实际发音和用户希望发出的目标声音的全面对应关系,进而为后续的声音修正(包括对基音和/或泛音和/或语速和/或口音等各种声音特征的修正)提供更加精确的数据依据。
本发明实施例中的声音优化规则的生成步骤,可以通过专有的发声特性测定装置来实现,即,发声特性测定装置可以将采集用户进行发声特性测定时的声音数据,从而实现对用户进行发声特性测定;具体包括:对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;
本发明实施例中生成用户的声音优化规则的执行主体可以由计算机等单独的处理设备实现;也可以是由作为音频数据处理装置的附属处理部件来实现;声音优化规则用于根据用户的声音特性与目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致。
本发明实施中的音频数据处理装置,可以是一个独立的设备,该独立的设备可以串接于话筒和声音输出设备之间的任何位置,
当音频数据处理装置是一个独立的设备时,音频数据处理装置还可以包括与预设声音处理设备连接的音频转出模块和与拾音设备连接的音频转入模块;
音频转出模块用于将修正后的音频数据传输至预设声音处理设备;音频转出模块可以是与声音处理设备适配的音频插头。
音频转入模块用于将拾音设备所获得的用户的音频数据传输至 所述特性获取模块。
本发明实施中的音频数据处理装置具体的工作原理和有益效果可以参照图1所对应的音频数据处理方法实施例,在此就不再赘述。
在本发明实施例的另一面,还提供了一种麦克风,包括音频转接插头、拾音单元和音频数据处理装置;
音频数据处理装置的特性获取模块与所述拾音单元电路连接,用于获取用户的音频数据;音频转接插头与预设声音处理设备适配,用于将修正后的所述音频数据传输至所述预设声音处理设备。
本发明实施中的音频数据处理装置还可以作为麦克风等拾音设备的附属设备,或者,也可以作为音箱或功放等声音处理设备的附属设备。本发明实施中的声音处理设备包括音箱耳机等声音输出设备,也可以包括如功放等其他的声音处理设备。
本发明实施中麦克风具体的工作原理和有益效果可以参照图1所对应的音频数据处理方法实施例,在此就不再赘述。
在本发明实施例的另一面,还提供了一种音频数据处理系统,包括上述实施例中的音频数据处理装置、优化规则生成装置和麦克风;
本发明实施例中的优化规则生成装置用于根据用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致。
具体来说,本发明实施例中可以包括音频数据处理装置、优化规则生成装置和麦克风这三种单独的设备;其中,可以音频数据处理装置可参考图3所对应实施例,优化规则生成装置可以由独立的计算机等处理设备来实现。
本发明实施中音频数据处理系统具体的工作原理和有益效果可以参照图1所对应的音频数据处理方法实施例,以及,上述音频数据处理装置实施例、麦克风实施例或音频数据处理设备实施例中的记载,在此就不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部 分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (16)

  1. 一种音频数据处理方法,其特征在于,包括:
    在预设阶段:
    对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;
    生成所述用户的声音优化规则,所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致;
    在应用阶段:
    根据所述用户实时发声的音频数据,获取所述用户的当前声音特性;
    以所述当前声音特性为参数,根据所述声音优化规则对音频数据进行实时的修正。
  2. 如权利要求1所述的音频数据处理方法,其特征在于,所述声音优化规则还用于:
    根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致。
  3. 如权利要求2中所述的音频数据处理方法,其特征在于,所述声音优化规则还用于:
    根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
  4. 如权利要求1至3中任一所述的音频数据处理方法,其特征 在于,所述声音优化规则的生成方法,包括:
    根据用户在模仿包括有多个所述目标声音特性的预设声音时对应的声音特性生成自变量;所述自变量包括对所述实际声音进行傅里叶转换后生成的声谱图;
    根据所述目标声音特性生成目标变量;所述目标变量包括对所述预设声音进行傅里叶转换后生成的声谱图;
    根据所述自变量和所述目标变量进行人工智能的训练,通过深度学习生成用于将所述用户的实际声音转换为用户希望发出的目标声音的声音优化模型。
  5. 如权利要求4中所述的音频数据处理方法,其特征在于,所述人工智能包括神经网络。
  6. 一种存储器,其特征在于,包括指令集,所述指令集适于处理器执行如权利要求1至5中任一所述音频数据处理方法中的步骤。
  7. 一种音频数据处理设备,其特征在于,包括总线、处理器和如权利要求6中所述存储器;
    所述总线用于连接所述存储器和所述处理器;
    所述处理器用于执行所述存储器中的指令集。
  8. 一种音频数据处理装置,其特征在于,包括:
    特性获取模块,用于对用户进行发声特性测定,获取用户在模仿包括有多个目标声音特性的预设声音时对应实际声音的声音特性;所述声音特性包括用户声音中基音的频率和声强、泛音的组成成分,以及各个泛音与基音的比例;
    修正模块,用于以所述当前声音特性为参数,根据声音优化规则对所述音频数据进行实时的修正;所述声音优化规则用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音频率调整至与对应的目标声音特性中基音频率一致。
  9. 如权利要求8所述的音频数据处理装置,其特征在于,还包括与预设声音输出设备连接的音频转出模块,和/或,与拾音设备连接的音频转入模块;
    所述音频转出模块用于将修正后的所述音频数据传输至所述预设声音处理设备;
    所述音频转入模块用于将所述拾音设备所获得的用户的音频数据传输至所述特性获取模块。
  10. 如权利要求9中所述的音频数据处理装置,其特征在于,所述预设声音处理设备包括音箱和/或功放;所述拾音设备包括麦克风。
  11. 如权利要求8所述的音频数据处理装置,其特征在于,所述声音优化规则还用于:
    根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的基音声强调整至与对应的目标声音特性中基音声强一致。
  12. 如权利要求8中所述的音频数据处理装置,其特征在于,所述声音优化规则还用于:
    根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的泛音组成,和/或,各个泛音与基音的比例,调整至与对应的目标声音特性中泛音组成,和/或,各个泛音与基音的比例一致。
  13. 如权利要求8至12中任一所述的音频数据处理装置,其特征在于,所述声音优化规则的生成方法,包括:
    根据用户在模仿包括有多个所述目标声音特性的预设声音时对应的声音特性生成自变量;所述自变量包括对所述实际声音进行傅里叶转换后生成的声谱图;
    根据所述目标声音特性生成目标变量;所述目标变量包括对所述预设声音进行傅里叶转换后生成的声谱图;
    根据所述自变量和所述目标变量进行人工智能的训练,通过深度学习生成用于将所述用户的实际声音转换为用户希望发出的目标声音的声音优化模型。
  14. 如权利要求13中所述的音频数据处理装置,其特征在于,所述人工智能包括神经网络。
  15. 一种麦克风,其特征在于,包括音频转接插头、拾音单元和 如权利要求8所述的音频数据处理装置;
    所述特性获取模块与所述拾音单元电路连接,用于获取用户的音频数据;
    所述音频转接插头与预设声音处理设备适配,用于将修正后的所述音频数据传输至所述预设声音处理设备。
  16. 一种音频数据处理系统,其特征在于,包括如权利要求8至14中任一所述的音频数据处理装置、优化规则生成装置和麦克风;
    所述优化规则生成装置用于根据所述用户的声音特性与所述目标声音特性的对应关系,将用户所发声音的声音特性中的一种或其任意组合调整至与对应的目标声音特性一致。
PCT/CN2019/087439 2019-05-17 2019-05-17 存储器、麦克风、音频数据处理方法、装置、设备和系统 WO2020232578A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980096054.3A CN114223032A (zh) 2019-05-17 2019-05-17 存储器、麦克风、音频数据处理方法、装置、设备和系统
PCT/CN2019/087439 WO2020232578A1 (zh) 2019-05-17 2019-05-17 存储器、麦克风、音频数据处理方法、装置、设备和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087439 WO2020232578A1 (zh) 2019-05-17 2019-05-17 存储器、麦克风、音频数据处理方法、装置、设备和系统

Publications (1)

Publication Number Publication Date
WO2020232578A1 true WO2020232578A1 (zh) 2020-11-26

Family

ID=73459554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087439 WO2020232578A1 (zh) 2019-05-17 2019-05-17 存储器、麦克风、音频数据处理方法、装置、设备和系统

Country Status (2)

Country Link
CN (1) CN114223032A (zh)
WO (1) WO2020232578A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1162167A (zh) * 1996-01-18 1997-10-15 雅马哈株式会社 修正演唱声以模仿标准声的共振峰转换装置
JP2000352991A (ja) * 1999-06-14 2000-12-19 Nippon Telegr & Teleph Corp <Ntt> スペクトル補正機能つき音声合成器
CN102881283A (zh) * 2011-07-13 2013-01-16 三星电子(中国)研发中心 用于语音处理的方法与系统
CN103531205A (zh) * 2013-10-09 2014-01-22 常州工学院 基于深层神经网络特征映射的非对称语音转换方法
CN104538011A (zh) * 2014-10-30 2015-04-22 华为技术有限公司 一种音调调节方法、装置及终端设备
CN106997767A (zh) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 基于人工智能的语音处理方法及装置
CN107886963A (zh) * 2017-11-03 2018-04-06 珠海格力电器股份有限公司 一种语音处理的方法、装置及电子设备
CN109272975A (zh) * 2018-08-14 2019-01-25 无锡冰河计算机科技发展有限公司 演唱伴奏自动调整方法、装置及ktv点唱机

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1162167A (zh) * 1996-01-18 1997-10-15 雅马哈株式会社 修正演唱声以模仿标准声的共振峰转换装置
JP2000352991A (ja) * 1999-06-14 2000-12-19 Nippon Telegr & Teleph Corp <Ntt> スペクトル補正機能つき音声合成器
CN102881283A (zh) * 2011-07-13 2013-01-16 三星电子(中国)研发中心 用于语音处理的方法与系统
CN103531205A (zh) * 2013-10-09 2014-01-22 常州工学院 基于深层神经网络特征映射的非对称语音转换方法
CN104538011A (zh) * 2014-10-30 2015-04-22 华为技术有限公司 一种音调调节方法、装置及终端设备
CN106997767A (zh) * 2017-03-24 2017-08-01 百度在线网络技术(北京)有限公司 基于人工智能的语音处理方法及装置
CN107886963A (zh) * 2017-11-03 2018-04-06 珠海格力电器股份有限公司 一种语音处理的方法、装置及电子设备
CN109272975A (zh) * 2018-08-14 2019-01-25 无锡冰河计算机科技发展有限公司 演唱伴奏自动调整方法、装置及ktv点唱机

Also Published As

Publication number Publication date
CN114223032A (zh) 2022-03-22

Similar Documents

Publication Publication Date Title
US10217452B2 (en) Speech synthesis device and method
CN109246515B (zh) 一种可提升个性化音质功能的智能耳机及方法
US20210375303A1 (en) Natural Ear
CN108053814B (zh) 一种模拟用户歌声的语音合成系统及方法
Janke et al. Fundamental frequency generation for whisper-to-audible speech conversion
US11942071B2 (en) Information processing method and information processing system for sound synthesis utilizing identification data associated with sound source and performance styles
US11727949B2 (en) Methods and apparatus for reducing stuttering
TWI742486B (zh) 輔助歌唱系統、輔助歌唱方法及其非暫態電腦可讀取記錄媒體
JP2008233672A (ja) マスキングサウンド生成装置、マスキングサウンド生成方法、プログラムおよび記録媒体
Wang et al. Enriching source style transfer in recognition-synthesis based non-parallel voice conversion
CN113436606A (zh) 一种原声语音翻译方法
Steinmetz et al. Steerable discovery of neural audio effects
JP2002268658A (ja) 音声分析及び合成装置、方法、プログラム
WO2020232578A1 (zh) 存储器、麦克风、音频数据处理方法、装置、设备和系统
CN116156214A (zh) 一种视频调音方法、装置、电子设备及存储介质
Williams I’m not hearing what you’re hearing: The conflict and connection of headphone mixes and multiple audioscapes
CN101860774A (zh) 一种能够自动修复声音的语音设备及方法
JP6657888B2 (ja) 音声対話方法、音声対話装置およびプログラム
WO2020241641A1 (ja) 生成モデル確立方法、生成モデル確立システム、プログラムおよび訓練データ準備方法
CN112509592B (zh) 电器设备、噪音处理方法和可读存储介质
JP6657887B2 (ja) 音声対話方法、音声対話装置およびプログラム
Lian et al. ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
JP2022065554A (ja) 音声合成方法およびプログラム
Jonason et al. Neural music instrument cloning from few samples
WO2024056078A1 (zh) 视频生成方法、装置和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929845

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929845

Country of ref document: EP

Kind code of ref document: A1