WO2019041186A1 - 一种音频变声方法、智能设备及存储介质 - Google Patents

一种音频变声方法、智能设备及存储介质 Download PDF

Info

Publication number
WO2019041186A1
WO2019041186A1 PCT/CN2017/099752 CN2017099752W WO2019041186A1 WO 2019041186 A1 WO2019041186 A1 WO 2019041186A1 CN 2017099752 W CN2017099752 W CN 2017099752W WO 2019041186 A1 WO2019041186 A1 WO 2019041186A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
voice
parameter
song
user
Prior art date
Application number
PCT/CN2017/099752
Other languages
English (en)
French (fr)
Inventor
陈俊冉
Original Assignee
深圳传音通讯有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳传音通讯有限公司 filed Critical 深圳传音通讯有限公司
Priority to PCT/CN2017/099752 priority Critical patent/WO2019041186A1/zh
Publication of WO2019041186A1 publication Critical patent/WO2019041186A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to an audio voice changing method, a smart device, and a storage medium.
  • the main object of the present invention is to provide an audio sounding method, a smart device and a storage medium, which aim to solve the technical problem that the voice changing function is too simple and the interaction is not strong in the prior art.
  • the present invention provides an audio sounding method, the method comprising the following steps:
  • the method further includes:
  • the audio parameter performs a voice-reduction process on the voice-changing voice information to obtain the voice-to-acoustic audio information.
  • the determining, according to the voice change instruction input by the user, determining the corresponding first voice change identifier according to the voice change command, and searching for the corresponding first audio parameter according to the first voice change identifier specifically:
  • the sound-changing processing is performed on the audio-visual audio information according to the first audio parameter, and the voice-changing sound information is obtained, which specifically includes:
  • the determining according to the variable-input instruction input by the user, determining a corresponding tone-changing identifier according to the tone-changing instruction, and searching for the corresponding second audio parameter according to the tone-changing identifier, specifically:
  • the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is used as the second audio parameter.
  • the method further includes:
  • each song identifier of the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is displayed, so that the user selects the song identifier to determine the corresponding song pitch parameter.
  • the determining according to the variable-input instruction input by the user, determining a corresponding tone-changing identifier according to the tone-changing instruction, and searching for the corresponding second audio parameter according to the tone-changing identifier, specifically:
  • the corresponding target song pitch parameter is searched for, according to the current song identifier input by the user, the target song pitch parameter is used to record a pitch parameter of the song corresponding to the song identifier;
  • the target song pitch parameter is identified as the second audio parameter.
  • the method further includes:
  • the method further includes:
  • the present invention also provides a smart device, the smart device comprising: a memory, a processor, and an audio sounding program stored on the memory and operable on the processor, the audio
  • the voice change program is configured to implement the steps of the audio sounding method.
  • the present invention also provides a storage medium on which an audio sound changing program is stored, and the audio sound changing program is implemented by a processor to implement the steps of the audio sound changing method.
  • the user can freely select the audio parameter to change the audio information to be changed, so that the obtained audio information to be changed has a large auditory difference with the voice information, which can better solve the existing voice changing function is too simple and man-machine The problem of inconvenient interaction.
  • FIG. 1 is a schematic structural diagram of an intelligent device in a hardware operating environment according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a first embodiment of an audio sounding method according to the present invention.
  • FIG. 3 is a schematic flow chart of a second embodiment of an audio sounding method according to the present invention.
  • FIG. 4 is a schematic flow chart of a third embodiment of an audio sounding method according to the present invention.
  • FIG. 5 is a schematic flow chart of a fourth embodiment of an audio sounding method according to the present invention.
  • FIG. 6 is a schematic flow chart of a fifth embodiment of an audio sounding method according to the present invention.
  • FIG. 1 is a schematic structural diagram of an intelligent device in a hardware operating environment according to an embodiment of the present invention.
  • the smart device may include a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 can include a display, and the optional user interface 1003 can also include a standard wired interface, a wireless interface.
  • the network interface 1004 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high speed RAM memory or a stable memory (non-volatile) Memory), such as disk storage.
  • the memory 1005 can also optionally be a storage device independent of the aforementioned processor 1001.
  • the smart device can be a smartphone, a tablet or other electronic device.
  • the smart device can integrate a microphone, and the microphone is used for collecting audio information to be changed.
  • FIG. 1 does not constitute a limitation to a smart device, and may include more or fewer components than those illustrated, or some components may be combined, or different component arrangements.
  • an operating system may be included in the memory 1005 as a computer storage medium.
  • a network communication module may be included in the memory 1005 as a computer storage medium.
  • a user interface module may be included in the memory 1005 as a computer storage medium.
  • an audio sounding program may be included in the memory 1005 as a computer storage medium.
  • the network interface 1004 is mainly used to connect to a background server, and performs data communication with the background server.
  • the user interface 1003 is mainly used to connect to a user terminal and perform data communication with the user terminal. It may be a smart phone or the like; the smart device calls the audio sounding program stored in the memory 1005 through the processor 1001, and performs the following operations:
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • the audio parameter performs a voice-reduction process on the voice-changing voice information to obtain the voice-to-acoustic audio information.
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is used as the second audio parameter.
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • each song identifier of the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is displayed, so that the user selects the song identifier to determine the corresponding song pitch parameter.
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • the corresponding target song pitch parameter is searched for, according to the current song identifier input by the user, the target song pitch parameter is used to record a pitch parameter of the song corresponding to the song identifier;
  • the target song pitch parameter is identified as the second audio parameter.
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • processor 1001 can call the audio voice change program stored in the memory 1005, and also performs the following operations:
  • the user can freely select the audio parameter to change the audio information to be changed, so that the obtained audio information to be changed and the voice information have a large auditory difference, which can better solve the existing voice changing function is too simple and human The problem of machine interaction is not convenient.
  • FIG. 2 is a schematic flowchart of a first embodiment of an audio sounding method according to the present invention.
  • the audio sounding method comprises the following steps:
  • Step S10 Acquire audio information of the user to be changed
  • the smart device is a smart phone
  • the user can perform audio collection through the microphone of the smart phone to obtain the audio information of the user to be changed, for example, the user collects the audio of the self description through the microphone.
  • Information "Hello world.”
  • the manner of obtaining the audio information to be changed by the user is not limited to real-time acquisition of the audio through the microphone of the smart phone itself, and the corresponding first voice-changing identifier is determined according to the voice-changing instruction in response to the voice-changing instruction input by the user.
  • the text information input by the user may also be obtained, and the text information is converted into the audio information to be changed.
  • the user inputs the Chinese character of “Hello World” through the physical keyboard of the smart phone or the virtual keyboard displayed on the touch screen.
  • From text to speech technology (Text To Speech, TTS) converts the words “Hello World” into the Chinese voice of "Hello World”, and acquires the audio information to be changed.
  • TTS Text To Speech
  • Step S20 determining, according to the voice change instruction, a corresponding first voice change identifier according to the voice change instruction input by the user;
  • the user can click on the voice change option on the smart phone to implement the voice change processing according to the voice change type selected by the user. For example, after obtaining the “Hello World” audio information to be changed, a variety of options can be displayed on the smartphone for the user to select, including “send”, “delete”, “re-record” and “change”, and the user can click
  • the “Voice” option after the user clicks the “Variable” option, can provide a variety of options for variable voice types, such as “man”, “ woman” and “child”.
  • the audio parameters of different voice type options exist.
  • the smartphone can generate a corresponding voice change command, which is the command information that the user operates on the smart phone to cause the smart phone to perform voice-changing processing on the audio information to be changed.
  • the smartphone can confirm that the user selects the "Men” option according to the voice change command, and at the same time determines the corresponding first voice change identifier, the first voice change identifier is used to distinguish different types of voice change parameters.
  • Step S30 Search for a corresponding first audio parameter according to the first voice change identifier
  • a corresponding first audio parameter may be searched according to the first voice change identifier, where the first audio parameter is used to represent a common feature in a man voice, for example, due to a male
  • the vocal cords are thicker than the female vocal cords, and the males have lower audio frequencies when they pronounce, which sounds wider and thicker. Therefore, the audio frequency in the first audio parameter can be set to a lower parameter.
  • the embodiment does not limit the type of the first voice-changing identifier and the first audio parameter, and the first voice-changing identifier may also be an animal or a cartoon character, etc., and the first audio parameter may be set to correspond to an acoustic parameter of an animal or a cartoon character. .
  • Step S40 Perform a voice-changing process on the to-be-changed audio information according to the first audio parameter to obtain voice-changing voice information.
  • the first audio parameter is an audio parameter corresponding to the first voice recognition identifier, that is, “man”, for example, the first audio parameter may include the voice frequency parameter being 82. Hz to 392 Hz, the reference zone is 64 Hz to 523 Hz.
  • the “world hello” obtained by the user or the text-to-speech voice can be changed, and the sound frequency parameter can be adjusted to the above-mentioned sound frequency parameter and the reference sound range, and the sound change can be saved.
  • the processed audio information is used as the vocal sound information.
  • the timbre and pitch of the sound can be changed, so that the obtained vocal sound information and the audio information to be changed are sensoryly different.
  • the sound changing processing is performed on the voice-changing audio information, it is not limited to changing only the frequency parameter of the sound, and the playback speed can be adjusted or slowed down.
  • the user can freely select the audio parameter to change the audio information to be changed, so that the obtained audio information to be changed and the voice information have a large auditory difference, which can better solve the existing voice changing function is too simple and human The problem of machine interaction is not convenient.
  • FIG. 3 is a schematic flowchart of a second embodiment of an audio sounding method according to the present invention. Based on the embodiment shown in FIG. 2, a second embodiment of the audio sounding method of the present invention is proposed.
  • step S40 the method further includes:
  • Step S50 Send the first voice change identifier and the voice change voice information to the first target device, so that the first target device searches for the corresponding first audio parameter according to the first voice change identifier, according to the The first audio parameter performs a voice-reduction process on the voice-changing voice information to obtain the voice-to-acoustic audio information.
  • the user can use the scheme to perform voice-changing processing on the audio information that he or she describes during communication, but because the audio information is processed, the receiver may not directly understand the meaning of the user according to the voice-changing voice information, because The voice-changing sound information may have a certain audio loss, which makes the listener difficult to understand.
  • the user sends the voice-changing voice information, if the audibility of the information is taken into consideration, the original voice-to-speech audio information needs to be sent again. Therefore, the user needs to send the voice information twice, resulting in a poor user experience.
  • the first target device when the voiced sound information is transmitted to the first target device, that is, when sent to the receiver, the first target device may be smart.
  • a mobile phone, tablet or other networkable electronic device will simultaneously transmit the first voice change indicator and the voice change sound information to the first target device.
  • the receiver When receiving the first voice change identifier and the voice change sound information, the receiver may determine the corresponding first audio parameter according to the first voice change identifier, and perform voice change reduction processing on the voice change voice information according to the first audio parameter.
  • the voice-changing process is an inverse process of the voice-changing process, and the voice-changing process converts the voice-to-acoustic audio information into voice-changing voice information by using a first audio parameter, Processing is to convert the voice-changing sound information into sound-to-acoustic audio information by the first audio parameter.
  • the receiver determines the voice change process by the first audio parameter.
  • the receiving side performs the restoration processing on the voice-changing sound information to obtain the audio information to be changed, thereby realizing the correct understanding of the language expression on the basis of using the voice-changing function.
  • FIG. 4 is a schematic flowchart of a third embodiment of an audio sounding method according to the present invention. Based on the embodiment shown in FIG. 2, a third embodiment of the audio sounding method of the present invention is proposed.
  • step S20 specifically includes:
  • Step S20' determining, according to the pitch change instruction input by the user, the corresponding tone change identifier according to the tone change instruction;
  • the user can perform certain sound processing on the songs performed by the user, for example, to correct the pitch of the song played by the user, because the song has a fixed The music spectrum is not necessarily the correct tone when the user sings. Therefore, the present embodiment can perform the pitch adjustment processing on the song sung by the user, so that the finally obtained tone sound information is more consistent with the tone of the original sound spectrum. .
  • the user can select to change the sung song on the smart phone. For example, when the user sings, the user can click “Finish” on the smartphone.
  • the option that is, generating a corresponding transposition instruction, the instruction information for initiating the transposition processing of the audio information sung by the user, and determining the corresponding transposition identifier according to the transposition instruction, wherein the transposition identifier is used to uniquely confirm that the current adjustment is performed. Tone processing of audio parameters.
  • Step S30 specifically:
  • Step S30′ searching for a corresponding second audio parameter according to the tone change identifier, where the second audio parameter is used to adjust a pitch parameter in the to-be-changed audio information to a preset pitch parameter;
  • the corresponding second audio parameter may be searched according to the tone change identifier. For example, when the user clicks the “Finishing” option, the user may determine the song that needs to be repaired, that is, the song currently sung by the user, for example, the user currently sings. "Race of the Volunteers", after clicking the "Finishing” option, the user can click the "Music Forces March” option in the song selection list. Therefore, the variable command may include the operation information of the user clicking the "Mighty Forces March".
  • a corresponding tone change identifier is determined according to the tone change instruction, and the tone change flag refers to identification information for uniquely determining an audio parameter, and is not limited to a song name because there are cases where the songs are different but the song names are the same.
  • the tone change logo can be a march for the Volunteers.
  • the corresponding second audio parameter can be searched according to the volunteers, and the second audio parameter is the music notation of the volunteers.
  • Step S40 specifically:
  • Step S40' performing a pitch adjustment process on the to-be-changed audio information according to the second audio parameter to obtain tone-changing sound information.
  • the audio information sung by the user may be transposed according to the music notation, specifically, by identifying the continuous tone or text sung by the user.
  • the corresponding note sung at the same time, at the same time, the pitch of the current time of the user is compared with the corresponding pitch in the notation. If it is different, the pitch of the current time is corrected to the corresponding pitch in the notation, and the corrected audio information is identified as
  • the pitching sound information also realizes that the pitch of the song sung by the user conforms to the pitch of the song itself on the basis of ensuring the voice sound of the user.
  • the user can reduce the recognition error by collecting the songs and displaying the song lyrics while collecting the user's audio information.
  • the situation occurs because, in this case, it is not necessary to determine the position of the note sung in the notation at the current time, and the default user can sing at the correct music beat.
  • the song sung by the user can be better corrected, and the user provided in this embodiment is easy to operate, and the recognition error can be reduced.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of an audio sounding method according to the present invention. Based on the embodiment shown in FIG. 4, a fourth embodiment of the audio sounding method of the present invention is proposed.
  • the steps S20' to S30' specifically include:
  • Step S201 Extract user pitch parameters in the audio information to be changed
  • the songs sung by the user are determined to identify the tracks sung by the user.
  • the music notation of the track that is, the audio parameters required for the pitch adjustment process, can be determined.
  • the user pitch parameter in the to-be-changed audio information is extracted, and the user pitch parameter refers to continuous tone information in the user's singing.
  • Step S202 calculating a matching degree between the user pitch parameter and each song pitch parameter
  • the degree of matching of the user pitch parameter with each song pitch parameter may be calculated, and the song pitch parameter refers to continuous scale information corresponding to the music notation of each song track, and the matching is calculated by calculation. Degree, the track that the user is currently singing can be well determined.
  • Step S203 When the matching degree is greater than or equal to a preset matching threshold, Counting the number of songs of the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold;
  • the preset matching threshold may be 0.7, and when the matching degree is greater than or equal to 0.7, the number of songs whose matching degree is greater than or equal to 0.7 may be counted. For example, if the number of songs is 3, the song counted by the smart phone is displayed. There are 3 songs in the track with higher similarity to the songs sung by the user; when the matching degree is less than 0.7, the user may display the display information of "unsuccessfully recognized song" to prompt the user to re-sing to obtain new waiting Voice-changing audio information.
  • Step S204 When the number of the songs is equal to the preset number, the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is used as the second audio parameter.
  • the pitch of the song has a certain degree of similarity, so the statistical matching degree is greater than or equal to the preset matching.
  • the number of songs of the song tone parameter of the threshold is not necessarily one. If the preset number is one, when the number of counted songs is one, the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold may be used as the second audio parameter, that is, the only song currently determined by the smartphone is The song sung for the user.
  • each song identifier of the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is displayed, so that the user selects the song identifier to determine the corresponding song. Tone parameters.
  • the song identifier may be a song title
  • the present embodiment cannot accurately determine the current
  • the song can be further determined by the user to all of the song titles that satisfy the requirements by the user to further determine the songs that the user desires or the user is singing.
  • the correct track of the song is determined, and at the same time, when the song track cannot be further accurately determined, a plurality of song tracks satisfying the condition can be displayed for further confirmation by the user.
  • FIG. 6 is a schematic flowchart of a fifth embodiment of an audio sounding method according to the present invention. Based on the embodiment shown in FIG. 4, a fifth embodiment of the audio sounding method of the present invention is proposed.
  • the steps S20' to S30' specifically include:
  • Step S301 responsive to the current song identifier input by the user, searching for a corresponding target song pitch parameter according to the current song identifier, where the target song pitch parameter is used to record a pitch parameter of a song corresponding to the song identifier;
  • the user can input the song identifier to directly determine the music notation parameter of the user's desired transposition. .
  • the current song identifier may be a song title, for example, the user inputs a “Yi Yongjun march”, and the pitch information of the corresponding volunteer music march music score may be determined according to the song name.
  • Step S302 Identify the target song pitch parameter as the second audio parameter.
  • tone information of the volunteer music music score can be used as the second audio parameter to implement the pitch adjustment process according to the second audio parameter.
  • the user performs the song singing in the unaccompanied environment, and after the transposition processing of the audio information of the singing, in order to further enhance the performance effect and improve the interest of use, the tone-changing sound information after the transposition processing can be
  • the song audio information is subjected to a mixing process to obtain a corresponding mixed audio file. For example, if the song sung by the user is "March of the Volunteers", the corresponding song accompaniment may be searched according to the song title of the volunteer's marching song, the song accompaniment is the audio information of the song, and the audio sung by the user is mixed with the accompaniment of the song. Sound processing, you can get the vocal singing audio with the song accompaniment.
  • the smart device after acquiring the audio information to be changed by the user, in response to the new voice change instruction input by the user, determining a corresponding second voice change identifier according to the new voice change command, Sending the second voice change identifier to the server, so that the server searches for the corresponding third audio parameter according to the second voice change identifier and feeds back the third audio parameter, and the voice to be changed according to the third audio parameter
  • the information is subjected to voice-changing processing to obtain new voice-changing sound information.
  • the audio parameters may be saved on the server side, and the server is used to store the correspondence between the voice recognition identifier and the audio parameter.
  • the voice change identifier can be sent to the server, and the corresponding audio parameter is determined according to the voice change identifier on the server side, and the audio parameter required by the user is sent back to the smart device in real time.
  • the voice change processing can be performed according to the third audio parameter, thereby simplifying the operation requirement of the smart device.
  • the operating efficiency of the smart device may be further improved.
  • the voice-changing audio information may be directly processed according to the third audio parameter, and after the voice-changing voice information is acquired, The voice-changing sound information is sent back to the smart device, so that the operation of the voice-changing process is migrated to the service-side execution, which reduces the device calculation amount of the smart device. Therefore, when the second voice change identifier is sent to the server, the voice information to be changed can be simultaneously sent to the server side.
  • the corresponding second audio parameter is determined by the user directly inputting the current song identifier.
  • the embodiment can determine the audio parameter more quickly.
  • an embodiment of the present invention further provides a storage medium, where the audio sound change program is stored, and when the audio sound change program is executed by the processor, the following operations are implemented:
  • the audio parameter performs a voice-reduction process on the voice-changing voice information to obtain the voice-to-acoustic audio information.
  • the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is used as the second audio parameter.
  • each song identifier of the song pitch parameter whose matching degree is greater than or equal to the preset matching threshold is displayed, so that the user selects the song identifier to determine the corresponding song pitch parameter.
  • the corresponding target song pitch parameter is searched for, according to the current song identifier input by the user, the target song pitch parameter is used to record a pitch parameter of the song corresponding to the song identifier;
  • the target song pitch parameter is identified as the second audio parameter.
  • the user can freely select the audio parameter to change the audio information to be changed, so that the obtained audio information to be changed and the voice information have a large auditory difference, which can better solve the existing voice changing function is too simple and human The problem of machine interaction is not convenient.
  • the embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course Hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention may be in the form of a software product in essence or in part contributing to the prior art. It is now found that the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), and includes a plurality of instructions for making a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device). Etc.) Performing the methods described in various embodiments of the invention.

Abstract

本发明公开了一种音频变声方法、智能设备及存储介质。本发明通过获取用户的待变声音频信息,响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识,根据所述第一变声标识查找对应的第一音频参数,根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息,也就让用户可自由选取音频参数对待变声音频信息进行改变,使得获取到的待变声音频信息与变声声音信息存在较大的听觉区别,可以较好地解决现有的变声功能过于简单且人机交互不太方便的问题。

Description

一种音频变声方法、智能设备及存储介质
技术领域
本发明涉及互联网技术领域,尤其涉及一种音频变声方法、智能设备及存储介质。
背景技术
随着即时通讯技术的迅速发展,人们对于实时交流的需求不断升高,同时,对于交流的趣味性也提出了差异性的要求。比如,用户通过通讯工具进行语音交流时,可对用户自己的声音进行改变,使得用户彼此的交流更加有趣,但是,用户可选择的变声类型过于有限,可选择类型的不多,而且,用户对于变声硬件或者变声软件的体验不好,所以,现有的变声硬件或者变声软件存在功能过于简单且交互性不强的技术问题。
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。
发明内容
本发明的主要目的在于提供一种音频变声方法、智能设备及存储介质,旨在解决现有技术中变声功能过于简单且交互性不强的技术问题。
为实现上述目的,本发明提供一种音频变声方法,所述方法包括以下步骤:
获取用户的待变声音频信息;
响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识;
根据所述第一变声标识查找对应的第一音频参数;
根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息。
优选地,所述根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息之后,所述方法还包括:
将所述第一变声标识和所述变声声音信息发送至第一目标设备,以使所述第一目标设备根据所述第一变声标识查找对应的所述第一音频参数,根据所述第一音频参数对所述变声声音信息进行变声还原处理,获得所述待变声音频信息。
优选地,所述响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识,根据所述第一变声标识查找对应的第一音频参数,具体包括:
响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识;
根据所述变调标识查找对应的第二音频参数,所述第二音频参数用于将所述待变声音频信息中的音调参数调整为预设音调参数;
相应地,所述根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息,具体包括:
根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息。
优选地,所述响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识,根据所述变调标识查找对应的第二音频参数,具体包括:
提取所述待变声音频信息中的用户音调参数;
计算所述用户音调参数与各歌曲音调参数的匹配度;
在所述匹配度大于等于预设匹配阈值时, 统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量;
在所述歌曲数量等于预设数量时,将所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数作为所述第二音频参数。
优选地,所述在所述匹配度大于等于预设匹配阈值时,统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量之后,所述方法还包括:
在所述歌曲数量不等于预设数量时,展示所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的各歌曲标识,以使用户选取歌曲标识来确定对应的歌曲音调参数。
优选地,所述响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识,根据所述变调标识查找对应的第二音频参数,具体包括:
响应于用户输入的当前歌曲标识,根据所述当前歌曲标识查找对应的目标歌曲音调参数,所述目标歌曲音调参数用于记录所述歌曲标识对应的歌曲的音调参数;
将所述目标歌曲音调参数认定为所述第二音频参数。
优选地,所述根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息之后,所述方法还包括:
根据所述当前歌曲标识查找对应的歌曲音频信息;
将所述歌曲音频信息与所述变调声音信息进行混音处理,以获取到对应的混音音频文件。
优选地,所述获取用户的待变声音频信息之后,所述方法还包括:
响应于用户输入的新的变声指令,根据所述新的变声指令确定对应的第二变声标识;
将所述第二变声标识发送至服务器,以使所述服务器根据所述第二变声标识查找对应的第三音频参数并反馈所述第三音频参数;
根据所述第三音频参数对所述待变声音频信息进行变声处理,获得新的变声声音信息。
此外,为实现上述目的,本发明还提供一种智能设备,所述智能设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的音频变声程序,所述音频变声程序配置为实现所述音频变声方法的步骤。
此外,为实现上述目的,本发明还提供一种存储介质,所述存储介质上存储有音频变声程序,所述音频变声程序被处理器执行时实现所述的音频变声方法的步骤。
本发明中用户可自由选取音频参数对待变声音频信息进行改变,使得获取到的待变声音频信息与变声声音信息存在较大的听觉区别,可以较好地解决现有的变声功能过于简单且人机交互不太方便的问题。
附图说明
图1是本发明实施例方案涉及的硬件运行环境的智能设备结构示意图;
图2为本发明音频变声方法第一实施例的流程示意图;
图3为本发明音频变声方法第二实施例的流程示意图;
图4为本发明音频变声方法第三实施例的流程示意图;
图5为本发明音频变声方法第四实施例的流程示意图;
图6为本发明音频变声方法第五实施例的流程示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
参照图1,图1为本发明实施例方案涉及的硬件运行环境的智能设备结构示意图。
如图1所示,该智能设备可以包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
所述智能设备可为智能手机、平板电脑或其他电子设备。此外,所述智能设备可集成麦克风,麦克风用于进行待变声音频信息的采集。
本领域技术人员可以理解,图1中示出的结构并不构成对智能设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及音频变声程序。
在图1所示的智能设备中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接用户终端,与用户终端进行数据通信,所述用户终端可为智能手机等;所述智能设备通过处理器1001调用存储器1005中存储的音频变声程序,并执行以下操作:
获取用户的待变声音频信息;
响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识;
根据所述第一变声标识查找对应的第一音频参数;
根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
将所述第一变声标识和所述变声声音信息发送至第一目标设备,以使所述第一目标设备根据所述第一变声标识查找对应的所述第一音频参数,根据所述第一音频参数对所述变声声音信息进行变声还原处理,获得所述待变声音频信息。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识;
根据所述变调标识查找对应的第二音频参数,所述第二音频参数用于将所述待变声音频信息中的音调参数调整为预设音调参数;
相应地,还执行以下操作:
根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
提取所述待变声音频信息中的用户音调参数;
计算所述用户音调参数与各歌曲音调参数的匹配度;
在所述匹配度大于等于预设匹配阈值时, 统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量;
在所述歌曲数量等于预设数量时,将所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数作为所述第二音频参数。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
在所述歌曲数量不等于预设数量时,展示所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的各歌曲标识,以使用户选取歌曲标识来确定对应的歌曲音调参数。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
响应于用户输入的当前歌曲标识,根据所述当前歌曲标识查找对应的目标歌曲音调参数,所述目标歌曲音调参数用于记录所述歌曲标识对应的歌曲的音调参数;
将所述目标歌曲音调参数认定为所述第二音频参数。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
根据所述当前歌曲标识查找对应的歌曲音频信息;
将所述歌曲音频信息与所述变调声音信息进行混音处理,以获取到对应的混音音频文件。
进一步地,处理器1001可以调用存储器1005中存储的音频变声程序,还执行以下操作:
响应于用户输入的新的变声指令,根据所述新的变声指令确定对应的第二变声标识;
将所述第二变声标识发送至服务器,以使所述服务器根据所述第二变声标识查找对应的第三音频参数并反馈所述第三音频参数;
根据所述第三音频参数对所述待变声音频信息进行变声处理,获得新的变声声音信息。
本实施例中用户可自由选取音频参数对待变声音频信息进行改变,使得获取到的待变声音频信息与变声声音信息存在较大的听觉区别,可以较好地解决现有的变声功能过于简单且人机交互不太方便的问题。
基于上述硬件结构,提出本发明音频变声方法的实施例。
参照图2,图2为本发明音频变声方法第一实施例的流程示意图。
在第一实施例中,所述音频变声方法包括以下步骤:
步骤S10:获取用户的待变声音频信息;
可以理解的是,若智能设备为智能手机时,用户在使用智能手机时,可通过智能手机的麦克风进行音频采集,以获取到用户的待变声音频信息,比如,用户通过麦克风采集自身叙述的音频信息,“世界你好”。当然,获取用户的待变声音频信息的方式也不限于通过智能手机本身的麦克风去实现音频的实时采集,在所述响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识之前,也可获取用户输入的文本信息,将所述文本信息转换为待变声音频信息,比如,用户通过智能手机的物理键盘或触摸屏显示的虚拟键盘键入“世界你好”的中文文字,可通过从文本到语音技术(Text To Speech,TTS)将“世界你好”的文字转换为“世界你好”的中文语音,也就获取到了所述待变声音频信息。
步骤S20:响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识;
在具体实现中,用户可在智能手机上通过点击变声选项,以实现根据用户选择的变声类型对待变声音频信息进行变声处理。比如,在获取到“世界你好”的待变声音频信息后,智能手机上可显示多种选项以供用户选择,包括“发送”“删除”“重新录制”“变声”等选项,用户可点击“变声”选项,在用户点击“变声”选项后,可提供多种可变声类型的选项给用户选择,比如,“男人”“女人”“小孩”等选项,不同的变声类型选项的音频参数存在区别,在用户选择了“男人”选项了,智能手机可生成对应的变声指令,所述变声指令为用户在智能手机上操作以使智能手机对于待变声音频信息进行变声处理的指令信息。在用户点击“男人”选项后,智能手机可根据变声指令确认用户选取的为“男人”选项,同时确定了对应的第一变声标识,所述第一变声标识用于区别不同类型的变声参数。
步骤S30:根据所述第一变声标识查找对应的第一音频参数;
应当理解的是,由于确定第一变声标识为“男人”,可根据第一变声标识查找对应的第一音频参数,所述第一音频参数用于表现男人声音中的共有特征,比如,由于男性的声带较之女性的声带偏厚,男性在发音时音频频率较低,听起来感觉声音偏宽厚,所以,可将第一音频参数中的音频频率设置为偏低的参数。
当然,本实施例不限制第一变声标识与第一音频参数的类型,所述第一变声标识也可为动物或卡通人物等,第一音频参数则可设置为对应动物或卡通人物的声学参数。
步骤S40:根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息。
在具体实现中,在确认第一音频参数后,若第一音频参数为第一变声标识即“男人”对应的音频参数,比如,第一音频参数可包括声音频率参数为82 Hz至392Hz,基准音区为64 Hz至523Hz。可根据所述第一音频参数对用户说出的或文本转语音获取到的“世界你好”进行变声处理,即可将声音频率参数调节至上述声音频率参数以及基准音区范围,并保存变声处理后的音频信息作为变声声音信息。因为,通过改变声音频率,进而可以改变声音的音色以及音调,使得获得的变声声音信息与待变声音频信息在感官上会有较大的不同。当然,在对待变声音频信息进行变声处理时,不限于只改变声音的频率参数,也可调快或调慢播放速度等。
本实施例中用户可自由选取音频参数对待变声音频信息进行改变,使得获取到的待变声音频信息与变声声音信息存在较大的听觉区别,可以较好地解决现有的变声功能过于简单且人机交互不太方便的问题。
参照图3,图3为本发明音频变声方法第二实施例的流程示意图,基于上述图2所示的实施例,提出本发明音频变声方法的第二实施例。
在第二实施例中,步骤S40之后,所述方法还包括:
步骤S50:将所述第一变声标识和所述变声声音信息发送至第一目标设备,以使所述第一目标设备根据所述第一变声标识查找对应的所述第一音频参数,根据所述第一音频参数对所述变声声音信息进行变声还原处理,获得所述待变声音频信息。
在具体实现中,用户在交流时可对自己叙述的音频信息使用本方案进行变声处理,但是,由于对音频信息进行了处理,接收方可能无法直接根据变声声音信息来理解用户表达的意思,因为,变声声音信息可能存在一定的音频损耗,导致听者不易理解。但是,用户在发送变声声音信息后,若考虑到信息的易听性,则需再发一次原本的待变声音频信息,如此则需要用户发送两次语音信息,导致用户使用体验较差。
可以理解的是,为了同时保证使用变声功能的趣味性以及用户操作的简易性,在将变声声音信息发送至第一目标设备时,即发送至接收方时,所述第一目标设备可为智能手机、平板电脑或其他可联网的电子设备,将同时发送第一变声标识和变声声音信息至第一目标设备。接收方在收到第一变声标识和变声声音信息时,可根据第一变声标识确定对应的第一音频参数,并根据所述第一音频参数对所述变声声音信息进行变声还原处理。区别于第一实施例中的变声处理,所述变声还原处理为所述变声处理的逆过程,所述变声处理为通过第一音频参数将待变声音频信息转换为变声声音信息,所述变声还原处理为通过第一音频参数将变声声音信息转换为待变声音频信息。
当然,通过同时将第一变声标识发送至接收方,可以降低接收方识别变声声音信息的计算量,以使接收方确定是通过第一音频参数进行的变声过程。
本实施例中通过接收方对变声声音信息进行还原处理以获得待变声音频信息,实现了在使用变声功能的基础上同时兼顾对于语言表达的正确理解。
参照图4,图4为本发明音频变声方法第三实施例的流程示意图,基于上述图2所示的实施例,提出本发明音频变声方法的第三实施例。
在第三实施例中,步骤S20,具体包括:
步骤S20′:响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识;
可以理解的是,为了满足用户用智能手机进行歌曲演绎的需求,可对用户所演绎的歌曲进行一定的修音处理,比如,对用户所演绎的歌曲进行音调的修正,因为,歌曲具有固定的音谱,而用户在演唱时不一定都是正确的音调,所以,本实施例可对用户所演唱的歌曲进行变调处理,以使得最终获取到的变调声音信息更加地符合原有音谱的音调。具体而言,用户在演唱完一段歌曲以后或者在演唱歌曲的同时,可在智能手机上选择对所唱的歌曲进行变调处理,比如,用户在演唱时,可在智能手机上点击“修音”选项,也就生成对应的变调指令,所述变调指令为开启对用户演唱的音频信息进行变调处理的指令信息,并根据变调指令确定对应的变调标识,所述变调标识用于唯一地确认当前进行变调处理的音频参数。
步骤S30,具体包括:
步骤S30′:根据所述变调标识查找对应的第二音频参数,所述第二音频参数用于将所述待变声音频信息中的音调参数调整为预设音调参数;
在具体实现中,可根据变调标识查找对应的第二音频参数,比如,用户在点击“修音”选项时,可确定当前需要修音的歌曲即用户当前演唱的歌曲,比如,用户当前将演唱“义勇军进行曲”,用户在点击“修音”选项后,可在歌曲选择列表中点击“义勇军进行曲”的选项,所以,所述变调指令中可包括用户点击“义勇军进行曲”的操作信息,则可根据所述变调指令确定对应的变调标识,所述变调标识是指用于唯一地确定音频参数的标识信息,不限于歌曲名称,因为,存在歌曲不同但歌曲名称相同的情况存在。此时,变调标识可为义勇军进行曲。
可以理解的是,可根据义勇军进行曲查找对应的第二音频参数,所述第二音频参数即为义勇军进行曲的音乐简谱。
步骤S40,具体包括:
步骤S40′:根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息。
应当理解的是,在获取到义勇军进行曲的音乐简谱后,可根据所述音乐简谱对用户演唱的音频信息进行变调处理,具体而言,通过识别出用户所演唱的连续音调或者文字,来确定当前时刻所演唱的对应音符,同时,将用户当前时刻的音调与简谱中的对应音调进行比对,若不同,则将当前时刻的音调修正为简谱中的对应音调,将修正后的音频信息认定为所述变调声音信息,也就实现了在保证用户的声音音色的基础上让用户所演唱歌曲的音调符合歌曲本身的音调。当然,对于用户所演唱的连续音调或者文字可能存在一定的识别错误,为了降低该种识别错误,可通过智能手机外放歌曲以及显示歌曲歌词的方式,同时采集用户的音频信息,来降低识别错误的情况发生,因为,该种情况下无需确定当前时刻在简谱中所演唱的音符位置,直接默认用户在以正确的音乐节拍进行演唱即可。
本实施例中通过对用户演唱的歌曲进行变调处理,可以较好地对用户自身演唱的歌曲进行音调修正,并且,本实施例提供的方式用户操作简易,且可降低识别错误的情况发生。
参照图5,图5为本发明音频变声方法第四实施例的流程示意图,基于上述图4所示的实施例,提出本发明音频变声方法的第四实施例。
在第四实施例中,步骤S20′至S30′,具体包括:
步骤S201:提取所述待变声音频信息中的用户音调参数;
可以理解的是,为了便于确定变调处理所需的音频参数,存在多种方式以便用户去确定音频参数,比如本实施例中将通过识别用户自身演唱的歌曲音频去确定用户所唱的曲目,在确定用户所演绎的曲目后,即可确定该曲目的音乐简谱,即变调处理所需的音频参数。
在具体实现中,在获取到用户所演唱的待变声音频信息后,将提取出所述待变声音频信息中的用户音调参数,所述用户音调参数是指用户演唱中的连续音调信息。
步骤S202:计算所述用户音调参数与各歌曲音调参数的匹配度;
应当理解的是,为了确定用户当前演唱的歌曲曲目,可计算用户音调参数与各歌曲音调参数的匹配度,所述歌曲音调参数是指各歌曲曲目的音乐简谱对应的连续音阶信息,通过计算匹配度,可以较好地确定用户当前所演唱的曲目。
步骤S203:在所述匹配度大于等于预设匹配阈值时, 统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量;
在具体实现中,所述预设匹配阈值可为0.7,在所述匹配度大于等于0.7时,可统计匹配度大于等于0.7的歌曲数量,比如,歌曲数量为3,则表示智能手机统计的歌曲曲目中有3首歌与用户演唱的歌曲相似度较高;在所述匹配度小于0.7时,可向用户展示“歌曲未成功识别”的显示信息,以提示用户进行重新演唱以获取新的待变声音频信息。
步骤S204:在所述歌曲数量等于预设数量时,将所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数作为所述第二音频参数。
应当理解的是,考虑到用户演唱的音调不一定能够较好地符合歌曲的真实简谱音调,以及存在较多的歌曲简谱音调具有一定的相似度,所以,统计到的匹配度大于等于预设匹配阈值的歌曲音调参数的歌曲数量不一定为一。若预设数量为一,在统计到的歌曲数量为一时,则可将匹配度大于等于预设匹配阈值的歌曲音调参数作为所述第二音频参数,即表明智能手机当前确定的唯一的歌曲即为用户所演唱的歌曲。
进一步地,在所述歌曲数量不等于预设数量时,展示所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的各歌曲标识,以使用户选取歌曲标识来确定对应的歌曲音调参数。
当然,在所述歌曲数量不为一时,可向用户展示统计出的匹配度大于预设匹配阈值的所有歌曲曲名,所述歌曲标识可为歌曲曲名,鉴于本实施例当前无法更加准确地确定当前歌曲,可通过向用户展示满足要求的所有歌曲曲名以供用户自身去进一步地确定用户所需要或用户正在演唱的歌曲。
本实施例中通过识别用户当前或预先演唱的歌曲音频以确定歌曲的正确曲目,同时,在无法进一步准确地确定歌曲曲目时,可展示满足条件的多个歌曲曲目以供用户进一步地确认。
参照图6,图6为本发明音频变声方法第五实施例的流程示意图,基于上述图4所示的实施例,提出本发明音频变声方法的第五实施例。
在第五实施例中,步骤S20′至S30′,具体包括:
步骤S301:响应于用户输入的当前歌曲标识,根据所述当前歌曲标识查找对应的目标歌曲音调参数,所述目标歌曲音调参数用于记录所述歌曲标识对应的歌曲的音调参数;
可以理解的是,为了便于确定变调处理所需的音频参数,存在多种方式以便用户去确定音频参数,比如本实施例中可通过用户自身输入歌曲标识以直接确定用户所需变调的音乐简谱参数。
在具体实现中,所述当前歌曲标识可为歌曲曲名,比如,用户输入“义勇军进行曲”,可根据该曲名确定对应的义勇军进行曲音乐简谱的音调信息。
步骤S302:将所述目标歌曲音调参数认定为所述第二音频参数。
应当理解的是,可将该义勇军进行曲音乐简谱的音调信息作为第二音频参数,以实现根据该第二音频参数实现变调处理。
进一步地,所述根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息之后,根据所述当前歌曲标识查找对应的歌曲音频信息,将所述歌曲音频信息与所述变调声音信息进行混音处理,以获取到对应的混音音频文件。
在具体实现中,用户在无伴奏的环境下进行歌曲演唱,在对演唱的音频信息进行变调处理后,为了进一步地增强表现效果以及提高使用的趣味性,可将变调处理后的变调声音信息与所述歌曲音频信息进行混音处理,以获取到对应的混音音频文件。比如,若用户演唱的歌曲为“义勇军进行曲”,可根据义勇军进行曲的歌曲曲名查找对应的歌曲伴奏,所述歌曲伴奏即为所述歌曲音频信息,并将用户演唱的音频与该歌曲伴奏进行混音处理,即可获得存在歌曲伴奏的人声演唱音频。
进一步地,为了减少智能设备本地的数据存储,所述获取用户的待变声音频信息之后,响应于用户输入的新的变声指令,根据所述新的变声指令确定对应的第二变声标识,将所述第二变声标识发送至服务器,以使所述服务器根据所述第二变声标识查找对应的第三音频参数并反馈所述第三音频参数,根据所述第三音频参数对所述待变声音频信息进行变声处理,获得新的变声声音信息。
当然,为了降低智能设备本身对于音频参数的数据保存量,同时,也是为了简化智能设备的运行方式,可将音频参数保存于服务器侧,所述服务器用于保存变声标识与音频参数的对应关系,以使用户在确定所需的变声标识后,可将该变声标识发送至服务器,在服务器侧根据变声标识确定对应的音频参数,并实时地将用户需要的音频参数发送回智能设备。
可以理解的是,在从服务器侧获取到与第二变声标识对应的第三音频参数后,即可根据第三音频参数进行变声处理,也就简化了智能设备的运行需求。当然,可进一步地提高智能设备的运行效率,在服务器查找到对应的第三音频参数后,可直接根据所述第三音频参数对待变声音频信息进行变声处理,在获取到变声声音信息后,将所述变声声音信息发送回智能设备,也就实现了将变声处理的操作迁移至服务侧执行,降低了智能设备的设备计算量。所以,在将第二变声标识发送至服务器时,可同时将待变声音频信息发送至服务器侧。
本实施例中通过用户直接输入当前歌曲标识来确定对应的第二音频参数,比之第四实施例,本实施例可以更快地确定音频参数。
此外,本发明实施例还提出一种存储介质,所述存储介质上存储有音频变声程序,所述音频变声程序被处理器执行时实现如下操作:
获取用户的待变声音频信息;
响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识;
根据所述第一变声标识查找对应的第一音频参数;
根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
将所述第一变声标识和所述变声声音信息发送至第一目标设备,以使所述第一目标设备根据所述第一变声标识查找对应的所述第一音频参数,根据所述第一音频参数对所述变声声音信息进行变声还原处理,获得所述待变声音频信息。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识;
根据所述变调标识查找对应的第二音频参数,所述第二音频参数用于将所述待变声音频信息中的音调参数调整为预设音调参数;
相应地,还实现如下操作:
根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
提取所述待变声音频信息中的用户音调参数;
计算所述用户音调参数与各歌曲音调参数的匹配度;
在所述匹配度大于等于预设匹配阈值时, 统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量;
在所述歌曲数量等于预设数量时,将所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数作为所述第二音频参数。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
在所述歌曲数量不等于预设数量时,展示所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的各歌曲标识,以使用户选取歌曲标识来确定对应的歌曲音调参数。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
响应于用户输入的当前歌曲标识,根据所述当前歌曲标识查找对应的目标歌曲音调参数,所述目标歌曲音调参数用于记录所述歌曲标识对应的歌曲的音调参数;
将所述目标歌曲音调参数认定为所述第二音频参数。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
根据所述当前歌曲标识查找对应的歌曲音频信息;
将所述歌曲音频信息与所述变调声音信息进行混音处理,以获取到对应的混音音频文件。
进一步地,所述音频变声程序被处理器执行时还实现如下操作:
响应于用户输入的新的变声指令,根据所述新的变声指令确定对应的第二变声标识;
将所述第二变声标识发送至服务器,以使所述服务器根据所述第二变声标识查找对应的第三音频参数并反馈所述第三音频参数;
根据所述第三音频参数对所述待变声音频信息进行变声处理,获得新的变声声音信息。
本实施例中用户可自由选取音频参数对待变声音频信息进行改变,使得获取到的待变声音频信息与变声声音信息存在较大的听觉区别,可以较好地解决现有的变声功能过于简单且人机交互不太方便的问题。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述 实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (10)

  1. 一种音频变声方法,其特征在于,所述方法包括以下步骤:
    获取用户的待变声音频信息;
    响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识;
    根据所述第一变声标识查找对应的第一音频参数;
    根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息之后,所述方法还包括:
    将所述第一变声标识和所述变声声音信息发送至第一目标设备,以使所述第一目标设备根据所述第一变声标识查找对应的所述第一音频参数,根据所述第一音频参数对所述变声声音信息进行变声还原处理,获得所述待变声音频信息。
  3. 如权利要求1所述的方法,其特征在于,所述响应于用户输入的变声指令,根据所述变声指令确定对应的第一变声标识,根据所述第一变声标识查找对应的第一音频参数,具体包括:
    响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识;
    根据所述变调标识查找对应的第二音频参数,所述第二音频参数用于将所述待变声音频信息中的音调参数调整为预设音调参数;
    相应地,所述根据所述第一音频参数对所述待变声音频信息进行变声处理,获得变声声音信息,具体包括:
    根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息。
  4. 如权利要求3所述的方法,其特征在于,所述响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识,根据所述变调标识查找对应的第二音频参数,具体包括:
    提取所述待变声音频信息中的用户音调参数;
    计算所述用户音调参数与各歌曲音调参数的匹配度;
    在所述匹配度大于等于预设匹配阈值时, 统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量;
    在所述歌曲数量等于预设数量时,将所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数作为所述第二音频参数。
  5. 如权利要求4所述的方法,其特征在于,所述在所述匹配度大于等于预设匹配阈值时,统计所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的歌曲数量之后,所述方法还包括:
    在所述歌曲数量不等于所述预设数量时,展示所述匹配度大于等于所述预设匹配阈值的所述歌曲音调参数的各歌曲标识,以使用户选取歌曲标识来确定对应的歌曲音调参数。
  6. 如权利要求3所述的方法,其特征在于,所述响应于用户输入的变调指令,根据所述变调指令确定对应的变调标识,根据所述变调标识查找对应的第二音频参数,具体包括:
    响应于用户输入的当前歌曲标识,根据所述当前歌曲标识查找对应的目标歌曲音调参数,所述目标歌曲音调参数用于记录所述歌曲标识对应的歌曲的音调参数;
    将所述目标歌曲音调参数认定为所述第二音频参数。
  7. 如权利要求6所述的方法,其特征在于,所述根据所述第二音频参数对所述待变声音频信息进行变调处理,获得变调声音信息之后,所述方法还包括:
    根据所述当前歌曲标识查找对应的歌曲音频信息;
    将所述歌曲音频信息与所述变调声音信息进行混音处理,以获取到对应的混音音频文件。
  8. 如权利要求1所述的方法,其特征在于,所述获取用户的待变声音频信息之后,所述方法还包括:
    响应于用户输入的新的变声指令,根据所述新的变声指令确定对应的第二变声标识;
    将所述第二变声标识发送至服务器,以使所述服务器根据所述第二变声标识查找对应的第三音频参数并反馈所述第三音频参数;
    根据所述第三音频参数对所述待变声音频信息进行变声处理,获得新的变声声音信息。
  9. 一种智能设备,其特征在于,所述智能设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的音频变声程序,所述音频变声程序被所述处理器执行时实现如权利要求1所述的音频变声方法的步骤。
  10. 一种存储介质,其特征在于,所述存储介质上存储有音频变声程序,所述音频变声程序被处理器执行时实现如权利要求1所述的音频变声方法的步骤。
PCT/CN2017/099752 2017-08-30 2017-08-30 一种音频变声方法、智能设备及存储介质 WO2019041186A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/099752 WO2019041186A1 (zh) 2017-08-30 2017-08-30 一种音频变声方法、智能设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/099752 WO2019041186A1 (zh) 2017-08-30 2017-08-30 一种音频变声方法、智能设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019041186A1 true WO2019041186A1 (zh) 2019-03-07

Family

ID=65524682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099752 WO2019041186A1 (zh) 2017-08-30 2017-08-30 一种音频变声方法、智能设备及存储介质

Country Status (1)

Country Link
WO (1) WO2019041186A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409809A (zh) * 2021-07-07 2021-09-17 上海新氦类脑智能科技有限公司 语音降噪方法、装置及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074204A1 (en) * 2007-09-19 2009-03-19 Sony Corporation Information processing apparatus, information processing method, and program
CN104200824A (zh) * 2014-08-25 2014-12-10 深圳市中兴移动通信有限公司 音频录制方法和装置
CN104299619A (zh) * 2014-09-29 2015-01-21 广东欧珀移动通信有限公司 一种音频文件的处理方法及装置
CN105632508A (zh) * 2016-01-27 2016-06-01 广东欧珀移动通信有限公司 音频处理方法及音频处理装置
CN106506437A (zh) * 2015-09-07 2017-03-15 腾讯科技(深圳)有限公司 一种音频数据处理方法,及设备
CN106873936A (zh) * 2017-01-20 2017-06-20 努比亚技术有限公司 电子设备及信息处理方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090074204A1 (en) * 2007-09-19 2009-03-19 Sony Corporation Information processing apparatus, information processing method, and program
CN104200824A (zh) * 2014-08-25 2014-12-10 深圳市中兴移动通信有限公司 音频录制方法和装置
CN104299619A (zh) * 2014-09-29 2015-01-21 广东欧珀移动通信有限公司 一种音频文件的处理方法及装置
CN106506437A (zh) * 2015-09-07 2017-03-15 腾讯科技(深圳)有限公司 一种音频数据处理方法,及设备
CN105632508A (zh) * 2016-01-27 2016-06-01 广东欧珀移动通信有限公司 音频处理方法及音频处理装置
CN106873936A (zh) * 2017-01-20 2017-06-20 努比亚技术有限公司 电子设备及信息处理方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409809A (zh) * 2021-07-07 2021-09-17 上海新氦类脑智能科技有限公司 语音降噪方法、装置及设备

Similar Documents

Publication Publication Date Title
WO2019041856A1 (zh) 家电控制方法、系统、控制终端、及存储介质
WO2014007545A1 (en) Method and apparatus for connecting service between user devices using voice
WO2017039142A1 (en) User terminal apparatus, system, and method for controlling the same
EP1374225B1 (en) Synchronise an audio cursor and a text cursor during editing
EP3403415A1 (en) Method and device for accelerated playback, transmission and storage of media files
WO2017047884A1 (en) Voice recognition server and control method thereof
WO2020050509A1 (en) Voice synthesis device
WO2019139301A1 (ko) 전자 장치 및 그 자막 표현 방법
WO2020105856A1 (en) Electronic apparatus for processing user utterance and controlling method thereof
WO2012148112A9 (ko) 클라이언트단말기를 이용한 음악 컨텐츠 제작시스템
WO2021003930A1 (zh) 客服录音的质检方法、装置、设备及计算机可读存储介质
WO2019112342A1 (en) Voice recognition apparatus and operation method thereof cross-reference to related application
WO2022065811A1 (en) Multimodal translation method, apparatus, electronic device and computer-readable storage medium
WO2015178600A1 (en) Speech recognition method and apparatus using device information
WO2020162709A1 (en) Electronic device for providing graphic data based on voice and operating method thereof
WO2020017798A1 (en) A method and system for musical synthesis using hand-drawn patterns/text on digital and non-digital surfaces
WO2020091183A1 (ko) 사용자 특화 음성 명령어를 공유하기 위한 전자 장치 및 그 제어 방법
WO2014163231A1 (ko) 복수의 음원이 출력되는 환경하에서 음성 인식에 이용될 음성 신호의 추출 방법 및 음성 신호의 추출 장치
WO2020130447A1 (ko) 페르소나에 기반하여 문장을 제공하는 방법 및 이를 지원하는 전자 장치
WO2021060728A1 (ko) 사용자 발화를 처리하는 전자 장치 및 그 작동 방법
WO2020116930A1 (en) Electronic device for outputting sound and operating method thereof
WO2020138662A1 (ko) 전자 장치 및 그의 제어 방법
EP3980991A1 (en) System and method for recognizing user's speech
WO2016080660A1 (en) Content processing device and method for transmitting segment of variable size
WO2015170799A1 (ko) 메시지 제공 방법 및 메시지 제공 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17923938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17923938

Country of ref document: EP

Kind code of ref document: A1