WO2024066790A1 - 音频处理方法、装置及电子设备 - Google Patents

音频处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2024066790A1
WO2024066790A1 PCT/CN2023/113612 CN2023113612W WO2024066790A1 WO 2024066790 A1 WO2024066790 A1 WO 2024066790A1 CN 2023113612 W CN2023113612 W CN 2023113612W WO 2024066790 A1 WO2024066790 A1 WO 2024066790A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
voice
human voice
vocal
audio
Prior art date
Application number
PCT/CN2023/113612
Other languages
English (en)
French (fr)
Inventor
李乃寒
陈远哲
贾东亚
王维斯
黄雷
佘康绵
王晓婵
李博琛
王如江
洪韵甯
Original Assignee
抖音视界有限公司
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 抖音视界有限公司, 脸萌有限公司 filed Critical 抖音视界有限公司
Publication of WO2024066790A1 publication Critical patent/WO2024066790A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/38Chord
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/02Synthesis of acoustic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Definitions

  • the embodiments of the present disclosure relate to the technical field of audio processing, and in particular to an audio processing method, device and electronic device.
  • a cappella is a singing method in which multiple people sing without the use of instrumental accompaniment. For example, the lead singer of an a cappella group can sing the song, and other performers can sing the harmony of different parts.
  • the user when obtaining the audio of a cappella singing by a single person, the user needs to obtain the harmonic melodies of multiple parts of the audio, and obtain the main tune song and the harmonies of different parts of the audio by singing multiple times, and then obtain the audio of a cappella singing.
  • a user can sing a song, the harmony of the high part of the song, and the harmony of the middle part of the song separately, and merge the sung song and the two harmonies through a terminal device to obtain the a cappella singing song.
  • the above method requires multiple singing to obtain the audio of a cappella singing, which leads to a high complexity of audio acquisition.
  • the present disclosure provides an audio processing method, device and electronic device, which are used to solve the technical problem of high complexity of audio acquisition in the prior art.
  • the present disclosure provides an audio processing method, the method comprising:
  • a target audio is played, where the target audio includes the first vocal and/or the second vocal of the one or more vocal parts.
  • the present disclosure provides an audio processing device, the audio processing device comprising an acquisition module, a generation module and a playback module, wherein:
  • the acquisition module is used to acquire the first human voice
  • the generating module is used for generating, based on the first human voice, a second human voice of one or more parts associated with the first human voice, wherein the timbre of the second human voice is a preset timbre;
  • the playing module is used to play target audio based on the first human voice and the second human voice of the one or more parts, and the target audio includes the first human voice and/or the second human voice of the one or more parts.
  • an embodiment of the present disclosure provides an electronic device, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the at least one processor performs the above The first aspect and the various audio processing methods that may be involved in the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the audio processing method as described in the first aspect and various possible aspects of the first aspect are implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the audio processing method as described in the first aspect and various possible aspects of the first aspect.
  • an embodiment of the present disclosure provides a computer program, which, when executed by a processor, implements the audio processing method as described in the first aspect and various possible aspects of the first aspect.
  • FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • FIG2 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure.
  • FIG3 is a schematic diagram of a process of obtaining a first human voice provided by an embodiment of the present disclosure
  • FIG4 is a schematic diagram of another process of obtaining a first human voice provided by an embodiment of the present disclosure.
  • FIG5 is a schematic diagram of playing target audio provided by an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of a process of displaying a voice control according to an embodiment of the present disclosure.
  • FIG7 is a schematic diagram of a process of displaying a first part control provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a method for adjusting the timbre of a second human voice provided by an embodiment of the present disclosure
  • FIG9 is a schematic diagram of a play page provided by an embodiment of the present disclosure.
  • FIG10 is a schematic diagram of a process of displaying a drag animation provided by an embodiment of the present disclosure
  • FIG11A is a schematic diagram of a process of playing target audio provided by an embodiment of the present disclosure.
  • FIG11B is a schematic diagram of another process of playing target audio provided by an embodiment of the present disclosure.
  • FIG12 is a schematic diagram of a method for generating a second human voice provided by an embodiment of the present disclosure.
  • FIG13 is a schematic diagram of the structure of an audio processing device provided by an embodiment of the present disclosure.
  • FIG14 is a schematic diagram of the structure of another audio processing device provided by an embodiment of the present disclosure.
  • FIG. 15 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present disclosure.
  • Electronic device a device with wireless transceiver function.
  • Electronic devices can be deployed on land, including indoors or outdoors, handheld, wearable or vehicle-mounted; they can also be deployed on the water (such as ships, etc.).
  • the electronic device can be a mobile phone, a portable Android device (PAD), a computer with wireless transceiver function, a virtual reality (VR) electronic device, an augmented reality (AR) electronic device, a wireless terminal in industrial control (industrial control), a vehicle-mounted electronic device, a wireless terminal in self-driving, a wireless electronic device in remote medical, a wireless electronic device in smart grid (smart grid), a wireless electronic device in transportation safety (transportation safety), a wireless electronic device in a smart city (smart city), a wireless electronic device in a smart home (smart home), a wearable electronic device, etc.
  • the electronic device involved in the embodiments of the present disclosure may also be referred to as a terminal, user equipment (UE), access electronic device, vehicle-mounted terminal, industrial control terminal, UE unit, UE station, mobile station, mobile station, remote station, remote electronic device, mobile device, UE electronic device, wireless communication device, UE agent or UE device, etc.
  • the electronic device may also be fixed or mobile.
  • a cappella singing is a singing style that does not use musical instruments for accompaniment. For example, in the process of a cappella singing, a lead singer can sing a song, and other performers can sing the music of the song in different parts (such as the music of the song in the high part or the music of the song in the low part, etc.). By playing the music of the song in different parts, the effect of vocal accompaniment is achieved, thereby realizing the a cappella singing of the song.
  • each line of melody is a part of the audio.
  • each part is a part of the audio.
  • the parts of an audio may include a soprano part, a mezzo-soprano part, a contralto part, a tenor part, a baritone part, and a bass part. It should be noted that the parts of the audio may also include other parts, which is not limited in the embodiments of the present disclosure.
  • Harmony is a sound combination audio composed of multiple different sounds sounding at the same time according to certain melody rules.
  • harmony can include chords and the connection method of multiple chords.
  • chords are the basic material of harmony.
  • Chords are obtained by combining three or more different sounds according to music theory.
  • Chords are the vertical structure of harmony, and the connection method of multiple chords is the horizontal movement of harmony.
  • Tone refers to the different characteristics of waveforms associated with different sounds. For example, different objects have different vibration characteristics. Different musical instruments such as pianos and violins have different vibration characteristics. Therefore, the timbres of pianos, violins and other musical instruments are also different. For example, the timbres of sounds produced by different people are also different.
  • the user when obtaining the audio of a cappella singing by a single person singing, the user needs to obtain the main tune song and the harmonies of different parts of the audio by singing multiple times, and then obtain the audio of a cappella singing.
  • the user can sing the main tune song, the harmony of the low part, the harmony of the middle part and the harmony of the high part of a song respectively, and merge the main tune song, the harmony of the low part, the harmony of the middle part and the harmony of the high part through an electronic device, and then obtain the a cappella song.
  • the user needs to obtain the accurate harmony melody of multiple parts, and needs to skillfully sing the harmony corresponding to each harmony melody, and then obtain the audio of a cappella singing by singing multiple times, resulting in a high complexity of audio acquisition.
  • the electronic device can display an audio acquisition page, the audio acquisition page includes a recording control and/or an audio import control; in response to a touch operation on the recording control or the audio import control, a first human voice is acquired, and based on the first human voice, a second human voice of one or more parts associated with the first human voice is generated, a playback page of the first human voice is displayed, the playback page includes one or more part controls and a playback control, in response to a touch operation on the one or more part controls, the one or more part controls are displayed based on a first display mode, and in response to a touch operation on the playback control, a target audio is played, wherein the target audio includes the first human voice and/or one or more parts.
  • the part control can be highlighted to improve the display effect, and because the electronic device can generate multiple parts of human voices associated with the first human voice based on the first human voice, the a cappella singing does not require the user to perform multiple harmony singing, thereby improving the user experience and reducing the complexity of audio acquisition.
  • Fig. 1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a music playback page.
  • the playback page includes a lead vocal control (a main tune song control of the music), a part control A, a part control B, a part control C, and a playback control.
  • the electronic device plays the target audio, wherein the target audio includes the main tune of the music, the second vocal associated with the part control A, and the second vocal associated with the part control B.
  • the electronic device can play the target audio including the first vocal and the second vocal associated with the part control, and the target audio of a cappella singing can be obtained without the user having to sing multiple harmony songs, thereby improving the user experience and reducing the complexity of audio acquisition.
  • FIG2 is a flow chart of an audio processing method provided by an embodiment of the present disclosure. Referring to FIG2 , the method may include:
  • the execution subject of the embodiment of the present disclosure is an electronic device, or an audio processing device arranged in the electronic device.
  • the audio processing device can be implemented by software, or by a combination of software and hardware, which is not limited in the embodiment of the present disclosure.
  • the first human voice may be a song.
  • the first human voice may be a song sung by the user in real time, or may be a song stored in a database of the electronic device.
  • the first human voice may be an a cappella song.
  • the first human voice may be a song sung by the user.
  • the electronic device may obtain the first human voice based on the following feasible implementation method: displaying an audio acquisition page, wherein the audio acquisition page may include a recording control and/or an audio import control, and obtaining the first human voice in response to a touch operation on the recording control or the audio import control.
  • the electronic device may display an audio acquisition page, wherein the audio acquisition page may include a recording control, and the user enables the electronic device to obtain the first human voice by performing a touch operation on the recording control.
  • the electronic device may display an audio acquisition page, wherein the audio acquisition page may include an audio import control, and the user enables the electronic device to obtain the first human voice by performing a touch operation on the audio import control.
  • the audio acquisition page may also include a recording control and an audio import control, which is not limited in the embodiments of the present disclosure.
  • the electronic device may display an audio acquisition page in response to a user's operation on an audio processing application.
  • an audio processing application may be installed in an electronic device, and when a user clicks on an icon of the audio processing application on a screen of the electronic device, the electronic device may display an audio acquisition page.
  • a user may enter a URL associated with the audio processing application in a browser of the electronic device so that the electronic device displays the audio acquisition page.
  • the electronic device may also display the audio acquisition page in other ways, and the embodiments of the present disclosure are not limited thereto.
  • obtaining the first voice may be implemented in the following two feasible ways:
  • a possible implementation method :
  • an audio import page is displayed.
  • the audio import page includes an audio file associated with the first voice.
  • the electronic device can display an audio import page including multiple audio files, each audio file is associated with a first voice.
  • the first human voice is obtained.
  • the audio upload page includes audio file A and audio file B.
  • the electronic device may determine the audio associated with audio file A as the first human voice
  • the electronic device may determine the audio associated with audio file B as the first human voice.
  • the audio file may be an audio file pre-recorded by the user.
  • the user may record multiple a cappella songs through an electronic device and save the song files to the electronic device.
  • the audio import page may include the audio files of the multiple a cappella songs pre-recorded by the user.
  • FIG3 is a schematic diagram of a process of obtaining a first human voice provided by an embodiment of the present disclosure.
  • the display page of the electronic device includes page 301 and page 302.
  • Page 301 is an audio acquisition page
  • the audio acquisition page includes an audio import control.
  • the electronic device jumps from page 301 to page 302.
  • Page 302 is an audio import page.
  • Page 302 includes files of audio A, audio B, audio C, and audio D.
  • the electronic device determines audio A as the first human voice.
  • the voice of the user is recorded.
  • the electronic device can receive the voice of the user and record the voice.
  • the recording control includes a recording state and a stop recording state.
  • the state of the recording control is the recording state
  • the electronic device stops recording the voice of the user is the stop recording state.
  • the state of the recording control is different, and the form of the recording control is also different.
  • the state of the recording control is the recording state
  • the form of the recording control is the first form
  • the state of the recording control is the stop recording state
  • the form of the recording control is the second form.
  • the recording of the human voice emitted by the user is stopped to obtain the first human voice.
  • the state of the recording control is the recording state
  • the electronic device stops recording the human voice emitted by the user.
  • the electronic device starts to collect the song sung by the user.
  • the user clicks the recording control again, and the electronic device stops collecting the song sung by the user.
  • the electronic device determines the collected song sung by the user as the first human voice.
  • FIG4 is another schematic diagram of a process for obtaining a first human voice provided by an embodiment of the present disclosure.
  • the display page of the electronic device includes page 401.
  • Page 401 is an audio acquisition page, and the audio acquisition page includes a recording control.
  • the user clicks the recording control the shape of the recording control changes, and the electronic device can record the voice of the user.
  • the user clicks the recording control again the shape of the recording control changes again, and the electronic device can obtain the first human voice.
  • the playback page may include a tuning control.
  • the electronic device acquires the first human voice, in response to the tuning control, By touching the controls, the electronic device can adjust the first person's voice to make the user's singing more harmonious, reduce the requirements on the user's singing skills, and thus reduce the complexity of audio acquisition.
  • S202 Based on the first human voice, generate a second human voice of one or more parts associated with the first human voice.
  • the timbre of the second voice is a preset timbre.
  • the timbre of the second voice can be alto, soprano, bass, tenor, etc.
  • the timbre of the second voice can also be other timbre, which is not limited in the embodiment of the present disclosure.
  • the first human voice can be associated with a second human voice in at least one voice part.
  • the voice parts associated with the first human voice can include a low voice part, a middle low voice part, a middle voice part, a middle high voice part and a high voice part.
  • the second human voice associated with the first human voice can include a second human voice in a high voice part of the first human voice and a second human voice in a low voice part of the first human voice.
  • the low voice part, the middle voice part and the high voice part in the embodiments of the present disclosure are only examples of illustrative pitches, and are not intended to limit the pitches.
  • the pitches of the low voice part, the middle voice part and the high voice part can be pre-set pitches, which are not limited in the embodiments of the present disclosure.
  • the electronic device may generate a second vocal of one or more parts associated with the first vocal based on the following feasible implementation: based on the first vocal, determine the harmony melody associated with the part, and generate the second vocal of the part based on the first vocal and the harmony melody.
  • the electronic device may obtain the accompaniment chords of the first vocal, reorganize the accompaniment chords based on the part to obtain the harmony melody of the part, and tune the first vocal to the harmony melody to obtain the second vocal corresponding to the part.
  • S203 Play the target audio based on the first vocal and one or more parts of the second vocal.
  • the target audio includes the first human voice and/or the second human voice of one or more parts.
  • the target audio may include only the first human voice, or the target audio may include the first human voice and the second human voice of multiple parts, which is not limited in the embodiments of the present disclosure.
  • the audio text of the first voice and the second voice are the same.
  • the first voice is a song
  • the second voice is also a song
  • the lyrics of the first voice are the same as the lyrics of the second voice.
  • the electronic device may play the target audio based on the following feasible implementation: displaying a playback page of the first voice.
  • the playback page includes one or more voice controls and a playback control.
  • the playback control is used to play the target audio. For example, when the user clicks the playback control, the electronic device may play the target audio. When the target audio is playing, if the user clicks the playback control, the electronic device may stop playing the target audio.
  • the voice part control is used to control the second human voice.
  • the voice part control can control whether the target audio includes the second human voice corresponding to the voice part control.
  • the playback page includes a high voice part control. If the user clicks the high voice part control, the target audio played by the electronic device includes the second human voice in the high voice. If the user turns off the high voice part control, the target audio played by the electronic device does not include the second human voice in the high voice.
  • the playback page may include at least one voice part control, and each voice part control has a corresponding first human voice.
  • the second human voice included in the target audio played by the electronic device is also different.
  • the playback page of the first human voice includes a low voice part control and a high voice part control.
  • the target audio played by the electronic device includes the second human voice in the low voice.
  • the target audio played by the electronic device includes the second human voice in the high voice.
  • the audio track of the first human voice and the audio track of the second human voice are in the same time period.
  • the process of playing the target audio is described below in conjunction with FIG. 5 .
  • FIG5 is a schematic diagram of a target audio playback method provided by an embodiment of the present disclosure.
  • a first audio track including a first human voice and a second audio track including a second human voice.
  • the electronic device can play the target audio, the play pointer in the first audio track moves to the right, and the play pointer in the second audio track moves to the right, wherein the play pointer on the second audio track moves to the left.
  • the needle position is the same as the play pointer position on the first track. In this way, users can add the second voice at any position of the first voice playing, which improves the flexibility of audio processing.
  • one or more part controls are displayed based on a first display mode.
  • the part control may include a first image.
  • the part control may include an image frame control, in which a static image or a dynamic image may be displayed, thereby improving the display effect of the part control.
  • the first display mode is used to highlight the first image.
  • the first display mode may be a highlight display or an enlarged display, etc., which is not limited in the embodiments of the present disclosure.
  • the electronic device controls the first image corresponding to the part control to be highlighted.
  • FIG6 is a schematic diagram of a process of displaying a part control provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a playback page of the first vocal.
  • the playback page includes a lead vocal control, a part control A, a part control B, a part control C, and a playback control.
  • the part control A, the part control B, and the part control C are all displayed in low light, and the target audio played by the electronic device includes the first vocal.
  • the display state of the voice control B is switched from low-brightness display to high-brightness display, and the target audio played by the electronic device includes the first human voice and the second human voice associated with the voice control B.
  • the display mode of the voice control is adjusted, which can improve the flexibility of interaction and the effect of page display.
  • a target audio is played, the target audio including the first human voice and a second human voice associated with one or more voice controls.
  • the target audio played by the electronic device may include the first human voice, the second human voice in the low voice, and the second human voice in the high voice.
  • the electronic device may also display the first part control based on the second display mode in response to a touch operation on the first part control displayed in the first display mode, and the target audio does not include the second human voice associated with the first part control.
  • the first part control may be a highlighted part control.
  • the second display mode is used to cancel the highlighting of the first image.
  • the second display mode may be a low-brightness display or a reduced display, etc., which is not limited in the embodiments of the present disclosure.
  • the display brightness of the second display mode is lower than the display brightness of the first display mode.
  • the first display mode may be a highlighted display
  • the second display mode may be a dimmed display.
  • the target audio played by the electronic device may include the first human voice, the second human voice in the low voice, and the second human voice in the high voice.
  • the electronic device controls the high voice control to be dimmed, and the target audio played by the electronic device may include the first human voice and the second human voice in the low voice, but does not include the second human voice in the high voice.
  • FIG7 is a schematic diagram of a process of displaying a first voice part control provided by an embodiment of the present disclosure.
  • an electronic device is included.
  • the display page of the electronic device is a playback page of the first vocal.
  • the playback page includes a lead vocal control, a voice part control A, a voice part control B, a voice part control C, and a playback control.
  • the target audio being played by the electronic device includes the first human voice and the second human voice associated with the voice control B, and the voice control A and the voice control C are displayed in low light, and the voice control B is displayed in high light.
  • the voice control B is displayed in high light.
  • the display state of the voice control B switches from high light to low light, and the target audio being played by the electronic device is displayed in low light.
  • the second vocal associated with part control B is not included in the target audio, and the first vocal is included in the target audio.
  • the disclosed embodiment provides an audio processing method, in which an electronic device can display an audio acquisition page, the audio acquisition page includes a recording control and/or an audio import control; in response to a touch operation on the recording control or the audio import control, a first human voice is acquired, and based on the first human voice, a second human voice of one or more parts associated with the first human voice is generated, a playback page of the first human voice is displayed, the playback page includes one or more part controls and a playback control, in response to a touch operation on one or more part controls, one or more part controls are displayed based on a first display mode, and in response to a touch operation on the playback control, a target audio is played, wherein the target audio includes the first human voice and the second human voice associated with one or more part controls.
  • the part control can be highlighted to improve the display effect, and because the electronic device can generate multiple parts of human voices associated with the first human voice based on the first human voice, a cappella singing does not require the user to perform multiple harmony singing, thereby improving the user experience and reducing the complexity of audio acquisition.
  • the playback page further includes a timbre control
  • the above-mentioned audio processing method further includes a method for adjusting the timbre of the second human voice.
  • the method for adjusting the timbre of the second human voice is described below in conjunction with FIG. 8 .
  • FIG8 is a schematic diagram of a method for adjusting the timbre of a second human voice provided by an embodiment of the present disclosure. Referring to FIG8 , the method flow includes:
  • the timbre of a second human voice associated with the second part control is adjusted to the timbre corresponding to the target timbre control.
  • the play page may include at least one timbre control, and each timbre control has a corresponding timbre.
  • the play page may include a timbre control A and a timbre control B, and the timbre associated with timbre control A may be timbre A, and the timbre associated with timbre control B may be timbre B.
  • the timbre associated with the timbre control may be the timbre of any user, or may be a virtual timbre, and the embodiments of the present disclosure do not limit this.
  • the electronic device may pre-set the timbre associated with timbre control A to be the timbre of user A, and the timbre associated with timbre control B to be a virtual timbre, etc.
  • FIG9 is a schematic diagram of a playback page provided by an embodiment of the present disclosure. Please refer to FIG9, including an electronic device. Among them, the display page of the electronic device is the first vocal playback page.
  • the playback page includes a lead vocal control (the main tune song control of the music), a part control A, a part control B, a part control C, a timbre control A, a timbre control B, a timbre control C and a playback control.
  • each part control corresponds to a second vocal of a part associated with the first vocal (such as a high voice, a middle voice and a low voice), and each timbre control corresponds to a timbre (such as a soprano, a mezzo-soprano and a bass).
  • the dragging operation may be a dragging operation of dragging the target timbre control to the second part control.
  • the target timbre control may be a timbre control touched by the user.
  • the play page includes timbre control A and timbre control B. If the user clicks on timbre control A, the target timbre control is timbre control A.
  • the play page includes a low voice control and a high voice control. If the user drags the target timbre control to the low voice control, the electronic device determines that the second part control is the low voice control. It should be noted that the user can drag a timbre control to a part control, and the user can also drag a timbre control to multiple part controls, which is not limited in the embodiments of the present disclosure.
  • the electronic device can replace the timbre of the second voice corresponding to the second part control with the timbre corresponding to the target timbre control.
  • the target audio played by the electronic device includes the first voice and the second voice of the soprano (timbre) in the high part, and the timbre corresponding to the target timbre control is tenor. If the user drags the target timbre control to the high part control, the electronic device replaces the timbre of the second voice in the high part with tenor, and the target audio played by the electronic device includes the first voice and the second voice of the tenor (timbre) in the high part.
  • the playback page of the first vocal further includes a pan control associated with the part control, and in response to a sliding operation on the pan control, the pan of the second vocal associated with the part control is adjusted.
  • the pan may be a channel of harmony. For example, if the pan is adjusted to the left, the channel of the harmony is shifted to the left, and if the pan is adjusted to the right, the channel of the harmony is shifted to the right.
  • the playback page may also include a volume control, a sound effect control, etc., which is not limited in the embodiments of the present disclosure.
  • the drag animation includes the animation of the timbre control.
  • the animation of the timbre control may be a second image
  • the timbre control may include an image control
  • the image control may include a dynamic image or a static image. The disclosed embodiment does not limit this.
  • the electronic device may display the second image associated with the timbre control.
  • the electronic device may display a drag animation associated with the drag operation based on the following feasible implementation: in response to the drag operation, display the animation of the timbre control, and control the animation of the timbre control to move along the drag track associated with the drag operation.
  • the electronic device may display an animation of a second image associated with the timbre control, and when the user drags the timbre control, the second image may move along the track dragged by the user.
  • the timbre control may also include a static image when not clicked, and when the user clicks on the timbre control, the electronic device may generate a second image identical to the static image, and control the movement of the second image based on the drag operation.
  • the electronic device can select a second voice of any timbre for the part control, or may not select a timbre.
  • the timbre of the second voice is determined. This is not limited in the embodiments of the present disclosure.
  • FIG10 is a schematic diagram of a process of displaying a drag animation provided by an embodiment of the present disclosure.
  • an electronic device is included.
  • the display page of the electronic device is a playback page of the first vocal.
  • the playback page includes a lead vocal control (a main tune song control of the music), a high voice control, a middle voice control, a low voice control, a female voice control, a virtual sound control, a male voice, and a playback control.
  • the electronic device can display the smiley face image associated with the male voice control.
  • the electronic device can display a drag animation, wherein the drag animation is an animation of the smiley face image moving to the middle voice control as the user operates.
  • the electronic device adjusts the image displayed in the middle voice control to the image corresponding to the male voice control.
  • FIG11A is a schematic diagram of a process of playing a target audio provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a playback page of the first vocal.
  • the playback page includes a lead vocal control, a high voice control, a middle voice control, a low voice control, a female voice control, a virtual sound control, a male voice control, and a playback control.
  • the high voice control, the middle voice control, and the low voice control are all displayed in low light, and the target audio played by the electronic device includes the first vocal.
  • the display state of the middle voice control switches from low-brightness display to high-brightness display, and the target audio played by the electronic device includes the first voice and the second voice in the middle voice.
  • the electronic device can display a smiley face image associated with the male voice control.
  • the smiley face image moves to the middle voice control with the user's operation, and the electronic device adjusts the image displayed by the middle voice control to the image corresponding to the male voice control.
  • the target audio played by the electronic device includes the first voice and the second voice in the middle voice, and the timbre of the second voice is male.
  • FIG11B is a schematic diagram of another process of playing target audio provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a playback page of the first vocal.
  • the playback page includes lead vocal controls, high voice controls, middle voice controls, low voice controls, female voice controls, virtual sound controls, male voice controls, and playback controls.
  • the high voice controls, middle voice controls, and low voice controls are all displayed in low light, and the target audio played by the electronic device includes the first vocal.
  • the display status of the middle voice control and the bass control is switched from low-light display to high-light display, and the target audio played by the electronic device may include the first human voice, the second human voice in the middle voice, and the second human voice in the bass.
  • the electronic device may display a smiling face image associated with the male voice control.
  • the target audio played by the electronic device may include the first human voice, the second human voice in the middle voice and the second human voice in the bass, and the timbre of the second human voice in the middle voice and the second human voice in the bass is male voice.
  • the disclosed embodiment provides a method for changing the timbre of a first harmony, in response to a drag operation of dragging a target timbre control in one or more timbre controls to a second part control in one or more part controls, adjusting the timbre of a second human voice associated with the second part control to the timbre corresponding to the timbre control, and displaying a drag animation associated with the drag operation based on the drag operation.
  • the user can arbitrarily adjust the timbre of the second human voice included in the target timbre by dragging the timbre control, thereby improving the flexibility of audio processing, and the human voice of each part can be set to the same timbre or a different timbre, thereby reducing the complexity of audio acquisition and improving the efficiency of audio acquisition.
  • the following describes a method for generating a second human voice of a third part for any one of the third parts in the above audio processing method in combination with FIG. 12 .
  • FIG12 is a schematic diagram of a method for generating a second human voice provided by an embodiment of the present disclosure. Referring to FIG12 , the method flow includes:
  • the harmony melody may be a melody associated with the first vocal part.
  • the target range is different, the melody corresponding to the song will also change, and the harmony melodies of different parts will also be different.
  • the harmony melody of the low part of the first vocal is melody A
  • the harmony melody of the high part of the first vocal is melody B, wherein melody A and melody B may be different melodies.
  • the electronic device may determine the harmonic melody associated with the third voice based on the following feasible implementation method: obtaining the accompaniment chords associated with the first voice. For example, if the first voice is the audio uploaded by the user through the audio import page, the electronic device may obtain the accompaniment of the first voice, and determine the accompaniment chords in the accompaniment through the chord detection algorithm; if the first voice is the audio of the user's impromptu humming, the electronic device may obtain the melody of the audio through the melody detection algorithm, and then match the corresponding chords to the melody through the chord matching algorithm to obtain the accompaniment chords of the first voice.
  • the accompaniment chords are adjusted to obtain a harmonic melody associated with the second vocal in the third part. For example, if the third part is a low part, the electronic device can lower the chord position in the accompaniment chords to obtain a harmonic melody in the low part; if the third part is a high part, the electronic device can increase the chord position in the accompaniment chords to obtain a harmonic melody in the high part.
  • the electronic device can adjust the chord position in the accompaniment chord according to the correspondence between the chord position and the voice part.
  • the chord position of the harmony melody of the low voice part is position A
  • the chord position of the harmony melody of the high voice part is position B.
  • the chord position of the accompaniment chord is adjusted based on position A.
  • the chord position of the accompaniment chord is adjusted based on position B. Adjust the chord position of the accompaniment chord in position B.
  • S1202 Generate a second vocal of a third part based on the first vocal and the harmony melody.
  • the electronic device may generate the second human voice of the third part based on the following feasible implementation: if the difference between the pitch associated with the harmony melody and the first pitch of the first human voice is less than or equal to the first threshold, the first human voice is tuned to the harmony melody to obtain the second human voice of the third part. For example, if the pitch of the song sung by the user is similar to the pitch of the harmony melody of the high part, the electronic device may tune the song word by word to the harmony melody to obtain the second human voice of the third part.
  • the pitch associated with the harmony melody is shifted to the position of the first pitch, and the first vocal is tuned to the harmony melody to obtain the second vocal of the third part.
  • the electronic device can shift the harmony melody in the high part to the position of the first pitch (or a similar position), and then tune the song to the harmony melody word by word to obtain the second vocal of the third part.
  • the harmony melodies of multiple parts can be shifted to the user's pitch position, and then tuned, thereby improving the audio processing effect, and the electronic device can record the pitch shift value of each harmony melody, and when the timbre of the second vocal is replaced, the harmony melody can be restored to the original pitch position based on the pitch shift value.
  • the electronic device may tune the first human voice to a harmony melody to obtain a second human voice in the third part based on the following feasible implementation method: tune the first human voice to a harmony melody to obtain a human voice to be processed.
  • the timbre of the human voice to be processed is the same as the timbre of the first human voice. For example, if the timbre of the first human voice is timbre A, the timbre of the human voice to be processed is timbre A, and if the timbre of the first human voice is timbre B, the timbre of the human voice to be processed is timbre B.
  • a preset timbre is obtained, and the timbre of the human voice to be processed is replaced with the preset timbre to obtain the second human voice of the third part.
  • the preset timbre can be the timbre associated with the timbre control.
  • the preset timbre can be timbre A
  • the preset timbre associated with the timbre control is timbre B
  • the preset timbre can be timbre B.
  • the preset timbre can also be other timbres, and the embodiments of the present disclosure are not limited to this.
  • the timbre of the second human voice can be a timbre that conforms to the third part.
  • the timbre of the second human voice in the third part generated by the electronic device can be a soprano.
  • the timbre of the second human voice in the third part generated by the electronic device can be a contralto. This can improve the playback effect of the second human voice, thereby improving the accuracy of audio acquisition and reducing the complexity of audio acquisition.
  • the disclosed embodiment provides a method for obtaining a first harmony, obtaining an accompaniment chord associated with a first vocal, adjusting the accompaniment chord based on a third part, obtaining a harmony melody associated with a second vocal in the third part, and generating a second vocal in the third part based on the first vocal and the harmony melody.
  • the electronic device can adjust the accompaniment chord corresponding to the first vocal based on the third part to obtain a harmony melody associated with the third part, and then adjust the first vocal word by word to the harmony melody to obtain a second vocal associated with the third part, thereby improving the playback effect of the second vocal and reducing the complexity of audio acquisition.
  • FIG13 is a schematic diagram of the structure of an audio processing device provided by an embodiment of the present disclosure.
  • the audio processing device 130 includes an acquisition module 131, a generation module 132 and a playback module 133, wherein:
  • the acquisition module 131 is used to acquire a first human voice
  • the generating module 132 is used to generate, based on the first human voice, a second human voice of one or more parts associated with the first human voice, wherein the timbre of the second human voice is a preset timbre;
  • the playing module 133 is used to play the target audio based on the first human voice and the second human voice of the one or more parts, and the target audio includes the first human voice and/or the second human voice of the one or more parts.
  • the acquisition module 131 is specifically used for:
  • the audio acquisition page includes a recording control and/or an audio import control
  • the first human voice is acquired.
  • the acquisition module 131 is specifically used for:
  • the first human voice is acquired.
  • the acquisition module 131 is specifically used for:
  • the recording of the human voice uttered by the user is stopped to obtain the first human voice.
  • the playback module 133 is specifically used for:
  • the target audio is played, wherein the target audio includes the first vocal and a second vocal associated with the one or more part controls.
  • the playback module 133 is further configured to:
  • the first part control In response to a touch operation on a first part control displayed in the first display mode, the first part control is displayed based on a second display mode, the target audio does not include a second human voice associated with the first part control, and the display brightness of the second display mode is lower than the display brightness of the first display mode.
  • the audio processing device provided in the embodiment of the present disclosure may be used to execute the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG14 is a schematic diagram of the structure of another audio processing device provided by an embodiment of the present disclosure. Based on FIG13 , referring to FIG14 , the audio processing device 130 further includes a response module 134, and the response module 134 is used to:
  • a drag animation associated with the drag operation is displayed, wherein the drag animation includes an animation of the timbre control.
  • the response module 134 is specifically used to:
  • the animation of the timbre control is controlled to move along the drag track associated with the drag operation.
  • the response module 134 is specifically used to:
  • the pan of the second vocal associated with the voice control is adjusted.
  • the generating module 132 is specifically used for:
  • a second vocal of the third part is generated based on the first vocal and the harmony melody.
  • the generating module 132 is specifically used for:
  • the accompaniment chords are adjusted to obtain a harmonic melody associated with the second vocal of the third part.
  • the generating module 132 is specifically used for:
  • the first vocal is tuned to the harmony melody to obtain the second vocal of the third part;
  • the pitch associated with the harmony melody is translated to the position of the first pitch, and the first vocal is tuned to the harmony melody to obtain the second vocal of the third part.
  • the generating module 132 is specifically used for:
  • the first human voice is pitch-corrected to the harmony melody to obtain a human voice to be processed, wherein the timbre of the human voice to be processed is the same as the timbre of the first human voice;
  • a preset timbre is obtained, and the timbre of the human voice to be processed is replaced with the preset timbre to obtain the second human voice of the third part.
  • the voice parts include a low voice part, a middle low voice part, a middle voice part, a middle high voice part and a high voice part.
  • the audio processing device provided in the embodiment of the present disclosure may be used to execute the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • the embodiments of the present disclosure further provide a computer-readable storage medium, in which computer-executable instructions are stored.
  • a processor executes the computer-executable instructions
  • the processor executes the methods described in the above-mentioned method embodiments.
  • the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements the methods described in the above-mentioned various method embodiments.
  • the embodiments of the present disclosure further provide a computer program product, including a computer program, which implements the methods described in the above-mentioned various method embodiments when executed by a processor.
  • the present disclosure provides an audio processing method, device and electronic device, wherein the electronic device obtains a first human voice, and based on the first human voice, generates a second human voice of one or more parts associated with the first human voice, wherein the timbre of the second human voice is a preset timbre, and plays a target audio based on the first human voice and the second human voice of one or more parts, wherein the target audio includes the first human voice and/or the second human voice of one or more parts.
  • the electronic device when the user finishes singing the first human voice, the electronic device can generate a harmony (second human voice) of multiple parts associated with the first human voice, and then obtain audio of a cappella singing, without the user having to sing multiple harmony songs, thereby improving the user experience and reducing the complexity of audio acquisition.
  • a harmony second human voice
  • FIG15 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 1500 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (PADs), portable multimedia players (PMPs), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs tablet computers
  • PMPs portable multimedia players
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG15 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 1500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1501, which may be configured to receive data according to a program stored in a read-only memory (ROM) 1502 or from a storage device.
  • the processing device 1501, ROM 1502 and RAM 1503 are connected to each other via a bus 1504.
  • An input/output (I/O) interface 1505 is also connected to the bus 1504.
  • the following devices may be connected to the I/O interface 1505: input devices 1506 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 1507 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 1508 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 1509.
  • the communication device 1509 may allow the electronic device 1500 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 15 shows an electronic device 1500 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 1509, or installed from the storage device 1508, or installed from the ROM 1502.
  • the processing device 1501 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory for short), an optical fiber, a portable compact disk read-only memory (CD-ROM for short), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, which carries a computer-readable program code. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, which may send, propagate, or transmit programs for use by or in conjunction with an instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be written entirely in
  • the program may be executed on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System On Chip (SOC), Complex Programmable Logic Device (CPLD), etc.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System On Chip
  • CPLD Complex Programmable Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information.
  • the user can autonomously choose whether to send the prompt message to the electronic device, application, server or storage medium that performs the operation of the technical solution of the present disclosure.
  • Such software or hardware may provide personal information.
  • the prompt information in response to receiving an active request from the user, may be sent to the user in the form of a pop-up window, in which the prompt information may be presented in text form.
  • the pop-up window may also carry a selection control for the user to choose "agree” or “disagree” to provide personal information to the electronic device.
  • the data involved in this technical solution shall comply with the requirements of the relevant laws and regulations.
  • the data may include information, parameters and messages, such as flow switching indication information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种音频处理方法、装置及电子设备,该方法包括:获取第一人声(S201);基于第一人声,生成第一人声相关联的一个或多个声部的第二人声,第二人声的音色为预设音色(S202);基于第一人声和一个或多个声部的第二人声,播放目标音频,该目标音频包括第一人声和/或一个或多个声部的第二人声(S203)。

Description

音频处理方法、装置及电子设备
相关申请交叉引用
本申请要求于2022年9月26日提交中国专利局、申请号为202211177547.X、发明名称为“音频处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本公开实施例涉及音频处理技术领域,尤其涉及一种音频处理方法、装置及电子设备。
背景技术
无伴奏合唱为通过多人清唱而不使用乐器伴奏的演唱方法。例如,无伴奏合唱的主唱可以演唱歌曲,其它演奏者可以演唱不同声部的和声。
目前,在通过单人演唱的方式得到无伴奏合唱的音频时,用户需要获取音频的多个声部的和声旋律,并通过多次演唱的方式,得到音频的主调歌曲和不同声部的和声,进而得到无伴奏演唱的音频。例如,用户可以分别演唱一首歌曲、该歌曲的高声部的和声和该歌曲的中声部的和声,并通过终端设备将演唱的歌曲和两个和声合并,得到无伴奏演唱的歌曲。但是,上述方法需要通过多次演唱才能得到无伴奏合唱的音频,进而导致音频获取的复杂度较高。
发明内容
本公开提供一种音频处理方法、装置及电子设备,用于解决现有技术中音频获取的复杂度较高的技术问题。
第一方面,本公开提供一种音频处理方法,该方法包括:
获取第一人声;
基于所述第一人声,生成所述第一人声相关联的一个或多个声部的第二人声,所述第二人声的音色为预设音色;
基于所述第一人声和所述一个或多个声部的第二人声,播放目标音频,所述目标音频包括所述第一人声和/或所述一个或多个声部的第二人声。
第二方面,本公开提供一种音频处理装置,该音频处理装置包括获取模块、生成模块和播放模块,其中:
所述获取模块用于,获取第一人声;
所述生成模块用于,基于所述第一人声,生成所述第一人声相关联的一个或多个声部的第二人声,所述第二人声的音色为预设音色;
所述播放模块用于,基于所述第一人声和所述一个或多个声部的第二人声,播放目标音频,所述目标音频包括所述第一人声和/或所述一个或多个声部的第二人声。
第三方面,本公开实施例提供一种电子设备,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上 第一方面以及第一方面各种可能涉及的所述音频处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能涉及的所述音频处理方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能涉及的所述音频处理方法。
第六方面,本公开实施例提供一种计算机程序,该计算机程序被处理器执行时实现如上述第一方面以及第一方面各种可能涉及的所述音频处理方法。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种应用场景示意图;
图2为本公开实施例提供的一种音频处理方法的流程示意图;
图3为本公开实施例提供的一种获取第一人声的过程示意图;
图4为本公开实施例提供的另一种获取第一人声的过程示意图;
图5为本公开实施例提供的一种播放目标音频的示意图;
图6为本公开实施例提供的一种显示声部控件的过程示意图;
图7为本公开实施例提供的一种显示第一声部控件的过程示意图;
图8为本公开实施例提供的一种调整第二人声的音色的方法示意图;
图9为本公开实施例提供的一种播放页面的示意图;
图10为本公开实施例提供的一种显示拖拽动画的过程示意图;
图11A为本公开实施例提供的一种播放目标音频的过程示意图;
图11B为本公开实施例提供的另一种播放目标音频的过程示意图;
图12为本公开实施例提供的一种生成第二人声的方法示意图;
图13为本公开实施例提供的一种音频处理装置的结构示意图;
图14为本公开实施例提供的另一种音频处理装置的结构示意图;以及,
图15为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
为了便于理解,下面,对本公开实施例涉及的概念进行说明。
电子设备:是一种具有无线收发功能的设备。电子设备可以部署在陆地上,包括室内或室外、手持、穿戴或车载;也可以部署在水面上(如轮船等)。所述电子设备可以是手机(mobile phone)、平板电脑(Portable Android Device,PAD)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)电子设备、增强现实(augmented reality,AR)电子设备、工业控制(industrial control)中的无线终端、车载电子设备、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线电子设备、智能电网(smart grid)中的无线电子设备、运输安全(transportation safety)中的无线电子设备、智慧城市(smart city)中的无线电子设备、智慧家庭(smart home)中的无线电子设备、可穿戴电子设备等。本公开实施例所涉及的电子设备还可以称为终端、用户设备(user equipment,UE)、接入电子设备、车载终端、工业控制终端、UE单元、UE站、移动站、移动台、远方站、远程电子设备、移动设备、UE电子设备、无线通信设备、UE代理或UE装置等。电子设备也可以是固定的或者移动的。
无伴奏演唱:无伴奏演唱为不使用乐器进行伴奏的演唱方式。例如,在无伴奏演唱的过程中,一个主唱可以演唱一首歌曲,其它的演奏者可以演唱该歌曲在不同声部的音乐(如,该歌曲在高声部的音乐或该歌曲在低声部的音乐等),通过演奏该歌曲在不同声部的音乐,实现人声伴奏的效果,进而实现该歌曲的无伴奏演唱。
声部:在音频中包括多行旋律同时进行演奏时,每一行旋律为该音频的一个声部。例如,在进行四部和声的演奏时,每一部都为音频的一个声部。例如,一段音频的声部可以包括女高音声部、女中音声部、女低音声部、男高音声部、男中音声部和男低音声部等。需要说明的是,音频的声部也可以包括其它的声部,本公开实施例对此不作限定。
和声:和声为多个不同的音按照一定的和旋规则同时发声而构成的音响组合音频。可选的,和声可以包括和弦和多个和弦的连接方式。例如,和弦是和声的基本素材,和弦是由3个或3个以上不同的音,根据乐理的方式进行结合得到的,和弦为和声的纵向结构,多个和弦的连接方式为和声的横向运动。
音色:音色是指不同声音相关联的波形之间有不同的特性。例如,不同的物体振动都有不同的特点,钢琴、小提琴等结构不同的乐器,振动的特点也不同,因此,钢琴、小提琴等乐器的音色也不相同。例如,不同的人发出的声音的音色也不同。
在相关技术中,在通过单人演唱的方式获取无伴奏合唱的音频时,用户需要通过多次演唱的方式,得到音频的主调歌曲和不同声部的和声,进而得到无伴奏演唱的音频。例如,用户可以分别演唱一首歌曲的主调歌曲、低声部的和声、中声部的和声和高声部的和声,并通过电子设备将主调歌曲、低声部的和声、中声部的和声和高声部的和声进行合并,进而得到无伴奏合唱的歌曲。但是,在上述方法中,用户需要得到多个声部的准确的和声旋律,并且需要熟练的演唱每个和声旋律对应的和声,进而通过多次演唱才能得到无伴奏合唱的音频,导致音频获取的复杂度较高。
为了解决相关技术中的技术问题,本公开实施例提供一种音频处理方法,电子设备可以显示音频获取页面,音频获取页面包括录音控件和/或音频导入控件;响应于对录音控件或音频导入控件的触控操作,获取第一人声,并基于第一人声,生成第一人声相关联的一个或多个声部的第二人声,显示第一人声的播放页面,播放页面包括一个或多个声部控件和播放控件,响应于对一个或多个声部控件的触控操作,基于第一显示方式显示一个或多个声部控件,响应于对播放控件的触控操作,播放目标音频,其中,目标音频包括第一人声和/或一个或多 个声部控件相关联的第二人声。这样,在用户点击声部控件时,声部控件可以高亮显示,提高显示效果,并且,由于电子设备可以基于第一人声,生成第一人声相关联的多个声部的人声,因此,无伴奏演唱无需用户进行多次的和声演唱,提高用户的体验,降低音频获取的复杂度。
下面,结合图1,对本公开实施例的应用场景进行介绍。
图1为本公开实施例提供的一种应用场景示意图。请参见图1,包括电子设备。其中,电子设备的显示页面为音乐的播放页面。播放页面中包括主唱控件(音乐的主调歌曲控件)、声部控件A、声部控件B、声部控件C和播放控件。在用户点击播放控件、声部控件A与声部控件B时,电子设备播放目标音频,其中,目标音频中包括音乐的主调、声部控件A相关联的第二人声和声部控件B相关联的第二人声。这样,在用户点击声部控件时,电子设备可以播放包括第一人声和声部控件相关联的第二人声的目标音频,无需用户进行多次和声演唱即可得到无伴奏演唱的目标音频,提高用户的体验,降低音频获取的复杂度。
下面以具体地实施例对本公开的技术方案以及本公开的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本公开的实施例进行描述。
图2为本公开实施例提供的一种音频处理方法的流程示意图。请参见图2,该方法可以包括:
S201、获取第一人声。
本公开实施例的执行主体为电子设备,也可以为设置在电子设备中的音频处理装置。其中,音频处理装置可以通过软件实现,音频处理装置也可以通过软件和硬件的结合实现,本公开实施例对此不作限定。
可选的,第一人声可以为歌曲。例如,第一人声可以为用户实时演唱的一首歌曲,第一人声也可以为电子设备的数据库中存储的一首歌曲。可选的,第一人声可以为无伴奏的歌曲。例如,第一人声可以为用户清唱的一首歌曲。
可选的,电子设备可以基于如下可行的实现方式,获取第一人声:显示音频获取页面,其中,音频获取页面可以包括录音控件和/或音频导入控件,响应于对录音控件或音频导入控件的触控操作,获取第一人声。例如,电子设备可以显示音频获取页面,其中,音频获取页面中可以包括录音控件,用户通过对录音控件的触控操作,以使电子设备获取第一人声。例如,电子设备可以显示音频获取页面,其中,音频获取页面中可以包括音频导入控件,用户通过对音频导入控件的触控操作,以使电子设备获取第一人声。需要说明的是,音频获取页面中也可以包括录音控件和音频导入控件,本公开实施例对此不作限定。
可选的,电子设备可以响应于用户对音频处理应用的操作,显示音频获取页面。例如,电子设备中可以安装音频处理应用,在用户点击电子设备屏幕中的音频处理应用的图标时,电子设备可以显示音频获取页面。可选的,用户可以在电子设备的浏览器中输入音频处理应用相关联的网址,以使电子设备显示音频获取页面,需要说明的是,电子设备也可以通过其它方式显示音频获取页面,本公开实施例对此不作限定。
可选的,响应于对录音控件或音频导入控件的触控操作,获取第一人声,有如下两种可行的实现方式:
一种可行的实现方式:
响应于对音频导入控件的触控操作,显示音频导入页面。可选的,音频导入页面中包括第一人声相关联的音频文件。例如,在用户点击音频获取页面中的音频导入控件时,电子设备可以显示包括多个音频文件的音频导入页面,每个音频文件都关联一个第一人声。
可选的,响应于对音频文件的触控操作,获取第一人声。例如,音频上传页面中包括音频文件A和音频文件B,在用户点击音频文件A时,电子设备可以将音频文件A相关联的音频确定为第一人声,在用户点击音频文件B时,电子设备可以将音频文件B相关联的音频确定为第一人声。
可选的,音频文件可以为用户预先录制的音频文件。例如,用户可以通过电子设备录制多个清唱的歌曲,并将歌曲的文件保存至电子设备中,在电子设备显示音频导入页面时,音频导入页面中可以包括用户预先录制的多个清唱的歌曲的音频文件。
下面,结合图3,对该种实现方式中,获取第一人声的过程进行说明。
图3为本公开实施例提供的一种获取第一人声的过程示意图。请参见图3,包括:电子设备。其中,电子设备的显示页面包括页面301和页面302。页面301为音频获取页面,音频获取页面中包括音频导入控件,在用户点击音频导入控件时,电子设备由页面301跳转至页面302。
请参见页面302,页面302为音频导入页面,页面302中包括音频A的文件、音频B的文件、音频C的文件和音频D的文件,在用户点击音频A的文件时,电子设备将音频A确定为第一人声。
另一种可行的实现方式:
响应于对录音控件的触控操作,录制用户发出的人声。例如,在用户点击录音控件时,电子设备可以接收到用户发出的语音,并录制该语音。可选的,录音控件包括录制状态和停止录制状态。例如,在电子设备录制用户发出的人声时,录音控件的状态为录制状态,在电子设备停止录制用户发出的人声时,录音控件的状态为停止录制状态。
可选的,录音控件的状态不同时,录音控件的形态也不同。例如,在录音控件的状态为录制状态时,录音控件的形态为第一形态,在录音控件的状态为停止录制状态时,录音控件的形态为第二形态,这样,用户可以通过录音控件的形态,准确的确定电子设备是否正在录制用户发出的人声,进而提高音频获取的准确度和用户的体验。
响应于对录音控件的触控操作,停止录制用户发出的人声,得到第一人声。例如,在录音控件的状态为录制状态时,若用户对录音控件进行触控操作,则电子设备停止录制用户发出的人声。例如,在用户点击录音控件时,电子设备开始采集用户演唱的歌曲,在用户演唱结束时,用户再次点击录音控件,电子设备停止采集用户演唱的歌曲,电子设备将采集的用户演唱的歌曲,确定为第一人声。
下面,结合图4,对该种实现方式中,获取第一人声的过程进行说明。
图4为本公开实施例提供的另一种获取第一人声的过程示意图。请参见图4,包括:电子设备。其中,电子设备的显示页面包括页面401。页面401为音频获取页面,音频获取页面中包括录音控件,在用户点击录音控件时,录音控件的形态发生改变,电子设备可以录制用户发出的语音,在用户再次点击录音控件时,录音控件的形态再次发生改变,电子设备可以得到第一人声。
可选的,播放页面中可以包括修音控件,在电子设备获取第一人声之后,响应于对修音 控件的触控操作,电子设备可以对第一人声进行修音,使得用户的歌声更加和谐,降低对用户唱功的要求,进而降低音频获取的复杂度。
S202、基于第一人声,生成第一人声相关联的一个或多个声部的第二人声。
可选的,第二人声的音色为预设音色。例如,第二人声的音色可以为女低音、女高音、男低音和男高音等,第二人声的音色也可以为其它音色,本公开实施例对此不作限定。
可选的,第一人声可以关联至少一个声部的第二人声。可选的,第一人声相关联的声部可以包括低声部、中低声部、中声部、中高声部和高声部。例如,第一人声相关联的第二人声可以包括第一人声在高声部的第二人声和第一人声在低声部的第二人声,需要说明的是,本公开实施例中的低声部、中声部和高声部只是示例的说明音高,并非对音高进行限制,并且低声部、中声部和高声部的音高可以为预先设置好的音高,本公开实施例对此不作限定。
可选的,电子设备可以基于如下可行的实现方式,生成第一人声相关联的一个或多个声部的第二人声:基于第一人声,确定该声部相关联的和声旋律,基于第一人声、和声旋律,生成该声部的第二人声。例如,电子设备可以获取第一人声的伴奏和弦,并基于该声部对伴奏和弦进行重组,得到该声部的和声旋律,并将第一人声修音至和声旋律,得到该声部对应的第二人声。
S203、基于第一人声和一个或多个声部的第二人声,播放目标音频。
可选的,目标音频包括第一人声和/或一个或多个声部的第二人声。例如,目标音频中可以只包括第一人声,目标音频中也可以包括第一人声和多个声部的第二人声,本公开实施例对此不做限定。
可选的,第一人声和第二人声的音频文本相同。例如,在第一人声为一段歌曲时,第二人声也为一段歌曲,并且第一人声的歌词与第二人声的歌词相同。
可选的,电子设备可以基于如下可行的实现方式,播放目标音频:显示第一人声的播放页面。可选的,播放页面包括一个或多个声部控件和播放控件。可选的,播放控件用于播放目标音频。例如,在用户点击播放控件时,电子设备可以播放目标音频,在目标音频播放时,若用户点击播放控件,则电子设备可以停止播放目标音频。
可选的,声部控件用于控制第二人声。例如,声部控件可以控制目标音频中是否包括该声部控件对应的第二人声。例如,播放页面中包括高声部控件,若用户点击高声部控件,则电子设备播放的目标音频中包括高声部的第二人声,若用户关闭高声部控件,则电子设备播放的目标音频中不包括高声部的第二人声。
需要说明的是,在本公开实施例中,播放页面中可以包括至少一个声部控件,并且每个声部控件都有对应的第一人声,触控不同的声部控件,电子设备播放的目标音频中包括的第二人声也不同。例如,第一人声的播放页面中包括低声部控件和高声部控件,在用户点击低声部控件时,电子设备播放的目标音频中包括低声部的第二人声,在用户点击高声部控件时,电子设备播放的目标音频中包括高声部的第二人声。
可选的,第一人声的音轨和第二人声的音轨在同一个时段下,下面,结合图5,对播放目标音频的过程进行说明。
图5为本公开实施例提供的一种播放目标音频的示意图。请参见图5,包括第一人声的第一音轨和第二人声的第二音轨。在用户点击播放控件时,电子设备可以播放目标音频,第一音轨中的播放指针向右移动,第二音轨中的播放指针向右移动,其中,第二音轨上的播放指 针位置与第一音轨上的播放指针位置相同。这样,用户可以在第一人声播放的任意位置添加第二人声,提高音频处理的灵活度。
可选的,响应于对一个或多个声部控件的触控操作,基于第一显示方式显示一个或多个声部控件。可选的,声部控件可以包括第一图像。例如,声部控件可以包括一个图像框控件,该图像框控件中可以显示一个静态图像或者动态图像,进而提高声部控件的显示效果。
可选的,第一显示方式用于突出显示第一图像。例如,第一显示方式可以为高亮显示或放大显示等,本公开实施例对此不作限定。例如,在用户对声部控件进行触控操作时,电子设备控制声部控件对应的第一图像高亮显示。
下面,结合图6,对通过第一显示方式显示声部控件的过程进行说明。
图6为本公开实施例提供的一种显示声部控件的过程示意图。请参见图6,包括电子设备。其中,电子设备的显示页面为第一人声的播放页面。播放页面中包括主唱控件、声部控件A、声部控件B、声部控件C和播放控件。声部控件A、声部控件B与声部控件C都为低亮显示,电子设备播放的目标音频中包括第一人声。
请参见图6,在用户点击声部控件B时,声部控件B的显示状态由低亮显示切换为高亮显示,并且电子设备播放的目标音频中包括第一人声和声部控件B相关联的第二人声。这样,响应于用户对声部控件的触控操作,调整声部控件的显示方式,可以提高交互的灵活度,提高页面显示的效果。
可选的,响应于对播放控件的触控操作,播放目标音频,所述目标音频包括第一人声和一个或多个声部控件相关联的第二人声。例如,若用户在播放页面中点击低声部控件和高声部控件,则在用户点击播放控件时,电子设备播放的目标音频中可以包括第一人声、低声部的第二人声和高声部的第二人声。
可选的,电子设备播放目标音频时,电子设备还可以响应于对第一显示方式显示的第一声部控件的触控操作,基于第二显示方式显示第一声部控件,目标音频中不包括第一声部控件相关联的第二人声。例如,第一声部控件可以为高亮显示的声部控件。可选的,第二显示方式用于取消突出显示第一图像。例如,第二显示方式可以为低亮显示或缩小显示等,本公开实施例对此不作限定。
可选的,第二显示方式的显示亮度低于所述第一显示方式的显示亮度。例如,第一显示方式可以为高亮显示,第二显示方式可以为低亮显示。例如,用户点击播放页面中的低声部控件和高声部控件,低声部控件和高声部控件高亮显示,电子设备播放的目标音频中可以包括第一人声、低声部的第二人声和高声部的第二人声,在用户再次点击高声部控件时,电子设备控制高声部控件低亮显示,并且电子设备播放的目标音频中可以包括第一人声和低声部的第二人声,不包括高声部的第二人声。
下面,结合图7,对通过第二显示方式显示第一声部控件的过程进行说明。
图7为本公开实施例提供的一种显示第一声部控件的过程示意图。请参见图7,包括电子设备。其中,电子设备的显示页面为第一人声的播放页面。播放页面中包括主唱控件、声部控件A、声部控件B、声部控件C和播放控件。
请参见图7,电子设备正在播放的目标音频中包括第一人声和声部控件B相关联的第二人声,并且声部控件A与声部控件C为低亮显示,声部控件B为高亮显示。在用户点击声部控件B时,声部控件B的显示状态由高亮显示切换为低亮显示,并且电子设备播放的目标音频 中不包括声部控件B相关联的第二人声,目标音频中包括第一人声。
本公开实施例提供一种音频处理方法,电子设备可以显示音频获取页面,音频获取页面包括录音控件和/或音频导入控件;响应于对录音控件或音频导入控件的触控操作,获取第一人声,并基于第一人声,生成第一人声相关联的一个或多个声部的第二人声,显示第一人声的播放页面,播放页面包括一个或多个声部控件和播放控件,响应于对一个或多个声部控件的触控操作,基于第一显示方式显示一个或多个声部控件,响应于对播放控件的触控操作,播放目标音频,其中,目标音频包括第一人声和一个或多个声部控件相关联的第二人声。这样,在用户点击声部控件时,声部控件可以高亮显示,提高显示效果,并且,由于电子设备可以基于第一人声,生成第一人声相关联的多个声部的人声,因此,无伴奏演唱无需用户进行多次的和声演唱,提高用户的体验,降低音频获取的复杂度。
在图2所示的实施例的基础上,播放页面还包括音色控件,上述音频处理方法中还包括调整第二人声的音色的方法,下面,结合图8,对调整第二人声的音色的方法进行说明。
图8为本公开实施例提供的一种调整第二人声的音色的方法示意图。请参见图8,该方法流程包括:
S801、响应于将一个或多个音色控件中的目标音色控件拖拽到一个或多个声部控件中的第二声部控件的拖拽操作,将第二声部控件相关联的第二人声的音色调整为目标音色控件对应的音色。
可选的,播放页面中可以包括至少一个音色控件,每个音色控件都有对应的音色。例如,播放页面中可以包括音色控件A和音色控件B,音色控件A相关联的音色可以为音色A,音色控件B相关联的音色可以为音色B。需要说明的是,音色控件相关联的音色可以为任意一个用户的音色,也可以为虚拟音色,本公开实施例对此不作限定。例如,电子设备可以预先设置音色控件A相关联的音色为用户A的音色,音色控件B相关联的音色为虚拟音色等。
下面,结合图9,对播放页面中的声部控件和音色控件进行说明。
图9为本公开实施例提供的一种播放页面的示意图。请参见图9,包括电子设备。其中,电子设备的显示页面为第一人声播放页面。播放页面中包括主唱控件(音乐的主调歌曲控件)、声部控件A、声部控件B、声部控件C、音色控件A、音色控件B、音色控件C和播放控件。其中,每个声部控件对应一个第一人声相关联的声部的第二人声(如,高声部人声、中声部人声和低声部人声),每个音色控件对应一个音色(如,女高音、女中音和男低音)。
可选的,拖拽操作可以为将目标音色控件拖拽到第二声部控件的拖拽操作。其中,目标音色控件可以为用户触控的音色控件。例如,播放页面中包括音色控件A和音色控件B,若用户点击音色控件A,则目标音色控件为音色控件A。例如,播放页面包括低声部控件和高声部控件,若用户将目标音色控件拖拽至低声部控件,则电子设备确定第二声部控件为低声部控件。需要说明的是,用户可以将一个音色控件拖拽至一个声部控件中,用户也可以将一个音色控件拖拽至多个声部控件中,本公开实施例对此不作限定。
可选的,在用户将目标音色控件拖拽至第二声部控件时,电子设备可以将第二声部控件对应的第二人声的音色替换为目标音色控件对应的音色。例如,电子设备播放的目标音频中包括第一人声和高声部的女高音(音色)的第二人声,目标音色控件对应的音色为男高音,若用户将目标音色控件拖拽至高声部控件,则电子设备将高声部的第二人声的音色替换为男高音,电子设备播放的目标音频中包括第一人声和高声部的男高音(音色)的第二人声。
可选的,第一人声的播放页面还包括声部控件相关联声相控件,响应于对声相控件的滑动操作,调整声部控件相关联的第二人声的声相。可选的,声相可以为和声的声道。例如,向左调整声相,和声的声道向左偏移,向右调整声相,和声的声道向右偏移。
需要说明的是,播放页面还可以包括音量控件、音效控件等,本公开实施例对此不作限定。
S802、基于拖拽操作,显示拖拽操作相关联的拖拽动画。
可选的,拖拽动画包括所述音色控件的动画。例如,音色控件的动画可以为第二图像,音色控件可以包括一个图像控件,图像控件中可以包括动态图像或者静态图像,本公开实施例对此不做限定,在用户点击音色控件时,电子设备可以显示音色控件相关联的第二图像。
可选的,电子设备可以基于如下可行的实现方式,显示拖拽操作相关联的拖拽动画:响应于拖拽操作,显示音色控件的动画,并控制音色控件的动画跟随拖拽操作相关联的拖拽轨迹移动。例如,在用户点击音色控件时,电子设备可以显示音色控件相关联的第二图像的动画,在用户拖拽音色控件时,第二图像可以随着用户拖拽的轨迹移动,需要说明的是,音色控件在未点击时也可以包括一张静态图像,在用户点击音色控件时,电子设备可以生成与该静态图像相同的第二图像,并基于拖拽操作控制第二图像移动。
需要说明的是,在显示播放页面时,电子设备可以为声部控件选择任意一个音色的第二人声,也可以不选择音色,在用户移动音色控件至声部控件时,确定第二人声的音色,本公开实施例对此不作限定。
下面,结合图10,对显示拖拽动画的过程进行说明。
图10为本公开实施例提供的一种显示拖拽动画的过程示意图。请参见图10,包括电子设备。其中,电子设备的显示页面为第一人声的播放页面。播放页面中包括主唱控件(音乐的主调歌曲控件)、高声部控件、中声部控件、低声部控件、女音控件、虚拟音控件、男音和播放控件。
请参见图10,在用户点击男音控件时,电子设备可以显示男音控件相关联的笑脸图像。在用户将笑脸图像拖拽至中声部控件时,电子设备可以显示拖拽动画,其中,拖拽动画为笑脸图像随着用户的操作,移动至中声部控件的动画。在用户将笑脸图像拖动至中声部控件时,电子设备将中声部控件显示的图像调整为男音控件对应的图像。
下面,结合图11A-图11B,对播放目标音频的过程进行说明。
图11A为本公开实施例提供的一种播放目标音频的过程示意图。请参见图11A,包括:电子设备。其中,电子设备的显示页面为第一人声的播放页面。播放页面中包括主唱控件、高声部控件、中声部控件、低声部控件、女音控件、虚拟音控件、男音控件和播放控件。高声部控件、中声部控件和低声部控件都为低亮显示,电子设备播放的目标音频包括第一人声。
请参见图11A,在用户点击中声部控件时,中声部控件的显示状态由低亮显示切换为高亮显示,并且电子设备播放的目标音频包括第一人声和中声部的第二人声。在用户点击男音控件时,电子设备可以显示男音控件相关联的笑脸图像。在用户将笑脸图像拖拽至中声部控件时,笑脸图像随着用户的操作移动至中声部控件,并且电子设备将中声部控件显示的图像调整为男音控件对应的图像。电子设备播放的目标音频包括第一人声和中声部的第二人声,该第二人声的音色为男音。
图11B为本公开实施例提供的另一种播放目标音频的过程示意图。请参见图11B,包括: 电子设备。其中,电子设备的显示页面为第一人声的播放页面。播放页面中包括主唱控件、高声部控件、中声部控件、低声部控件、女音控件、虚拟音控件、男音控件和播放控件。高声部控件、中声部控件和低声部控件都为低亮显示,电子设备播放的目标音频包括第一人声。
请参见图11B,在用户点击中声部控件和低声部控件时,中声部控件和低声部控件的显示状态由低亮显示切换为高亮显示,并且电子设备播放的目标音频中可以包括第一人声、中声部的第二人声和低声部的第二人声。在用户点击男音控件时,电子设备可以显示男音控件相关联的笑脸图像。
请参见图11B,在用户将笑脸图像拖拽至中声部控件时,笑脸图像随着用户的操作移动至中声部控件,在用户将笑脸图像拖拽至低声部控件时,笑脸图像随着用户的操作移动至低声部控件,并且电子设备将中声部控件显示的图像和低声部控件显示的图像调整为男音控件对应的图像,电子设备播放的目标音频中可以包括第一人声、中声部的第二人声和低声部的第二人声,中声部的第二人声和低声部的第二人声的音色为男音。
本公开实施例提供一种更换第一和声的音色的方法,响应于将一个或多个音色控件中的目标音色控件拖拽到一个或多个声部控件中的第二声部控件的拖拽操作,将第二声部控件相关联的第二人声的音色调整为音色控件对应的音色,基于拖拽操作,显示拖拽操作相关联的拖拽动画。这样,用户通过对音色控件的拖拽操作,可以任意的调整目标音色中包括的第二人声的音色,提高音频处理的灵活度,并且,每个声部的人声都可以设置相同的音色或者不同的音色,降低音频获取的复杂度,提高音频获取的效率。
在上述任意一个实施例的基础上,下面,结合图12,对上述音频处理方法中针对于多个声部中的任意一个第三声部,生成第三声部的第二人声的方法进行说明。
图12为本公开实施例提供的一种生成第二人声的方法示意图。请参见图12,该方法流程包括:
S1201、基于第一人声,确定第三声部相关联的和声旋律。
可选的,和声旋律可以为第一人声的声部相关联的旋律。例如,在音乐创作的过程中,目标音域不同,歌曲对应的旋律也会发生改变,不同声部的和声旋律也不同。例如,第一人声的低声部的人声的和声旋律为旋律A,第一人声的高声部的人声的和声旋律为旋律B,其中,旋律A和旋律B可以为不同的旋律。
可选的,电子设备可以基于如下可行的实现方式,确定所述第三声部相关联的和声旋律:获取第一人声相关联的伴奏和弦。例如,若第一人声为用户通过音频导入页面上传的音频,则电子设备可以得到第一人声的伴奏,并通过和弦检测算法,确定该伴奏中的伴奏和弦,若第一人声为用户即兴哼唱的音频,则电子设备可以通过旋律检测算法,获取该音频的旋律,进而通过和弦匹配算法,为该旋律匹配对应的和弦,得到第一人声的伴奏和弦。
基于第三声部,对伴奏和弦进行调整,得到第三声部的第二人声相关联的和声旋律。例如,若第三声部为低声部,则电子设备可以降低伴奏和弦中的和弦位置,进而得到低声部的和声旋律;若第三声部为高声部,则电子设备可以提高伴奏和弦中的和弦位置,进而得到高声部的和声旋律。
需要说明的是,电子设备可以通过和弦位置与声部的对应关系,调整伴奏和弦中的和弦位置。例如,低声部的和声旋律的和弦位置为位置A,高声部的和声旋律的和弦位置为位置B,若第三声部为低声部,则基于位置A调整伴奏和弦的和弦位置,若第三声部为高声部,则基 于位置B调整伴奏和弦的和弦位置。
S1202、基于第一人声以及和声旋律,生成第三声部的第二人声。
可选的,电子设备可以基于如下可行的实现方式,生成第三声部的第二人声:若和声旋律相关联的音高与第一人声的第一音高之间的音差小于或等于第一阈值,则将第一人声修音至和声旋律,得到第三声部的第二人声。例如,若用户清唱的歌曲的音高与高声部的和声旋律的音高相近,则电子设备可以逐字将歌曲修音至和声旋律,得到第三声部的第二人声。
可选的,若和声旋律相关联的音高与第一音高之间的音差大于第一阈值,则将和声旋律相关联的音高平移至第一音高的位置,并将第一人声修音至和声旋律,得到第三声部的第二人声。例如,若用户清唱的歌曲的音高与高声部的和声旋律的音高相差较大,则电子设备可以将高声部的和声旋律平移至第一音高的位置(也可以为相近的位置),再逐字将歌曲修音至和声旋律,得到第三声部的第二人声。这样,在多个声部的音高相差较大时,可以将多个声部的和声旋律平移至用户的音高位置,再进行修音,进而提高音频处理效果,并且,电子设备可以记录每个和声旋律的音高平移值,在对第二人声的音色进行更替时,可以基于音高平移值,将和声旋律还原至原音高的位置。
可选的,电子设备可以基于如下可行的实现方式,将第一人声修音至和声旋律,得到第三声部的第二人声:将第一人声修音至和声旋律,得到待处理人声。可选的,待处理人声的音色与第一人声的音色相同。例如,若第一人声的音色为音色A,则待处理人声的音色为音色A,若第一人声的音色为音色B,则待处理人声的音色为音色B。
可选的,获取预设音色,并将待处理人声的音色替换为预设音色,得到第三声部的第二人声。可选的,预设音色可以为音色控件相关联的音色。例如,若音色控件相关联的音色为音色A,则预设音色可以为音色A,若音色控件相关联的音色为音色B,则预设音色可以为音色B,需要说明的是,预设音色也可以为其它音色,本公开实施例对此不作限定。这样,第二人声的音色可以为符合第三声部的音色,例如,若第三声部为高声部,则电子设备生成的第三声部的第二人声的音色可以为女高音,若第三声部为低声部,则电子设备生成的第三声部的第二人声的音色可以为女低音,这样可以提高第二人声的播放效果,进而提高音频获取的准确度,降低音频获取的复杂度。
本公开实施例提供一种获取第一和声的方法,获取第一人声相关联的伴奏和弦,基于第三声部,对伴奏和弦进行调整,得到第三声部的第二人声相关联的和声旋律,基于第一人声以及和声旋律,生成第三声部的第二人声。这样,电子设备可以基于第三声部对第一人声对应的伴奏和弦进行调整,得到第三声部相关联的和声旋律,进而将第一人声逐字修音至和声旋律上,得到第三声部相关联的第二人声,提高第二人声的播放效果,降低音频获取的复杂度。
图13为本公开实施例提供的一种音频处理装置的结构示意图。请参见图13,该音频处理装置130包括获取模块131、生成模块132和播放模块133,其中:
所述获取模块131用于,获取第一人声;
所述生成模块132用于,基于所述第一人声,生成所述第一人声相关联的一个或多个声部的第二人声,所述第二人声的音色为预设音色;
所述播放模块133用于,基于所述第一人声和所述一个或多个声部的第二人声,播放目标音频,所述目标音频包括所述第一人声和/或所述一个或多个声部的第二人声。
根据本公开一个或多个实施例,所述获取模块131具体用于:
显示音频获取页面,所述音频获取页面包括录音控件和/或音频导入控件;
响应于对所述录音控件或所述音频导入控件的触控操作,获取所述第一人声。
根据本公开一个或多个实施例,所述获取模块131具体用于:
响应于对所述音频导入控件的触控操作,显示音频导入页面,所述音频导入页面中包括所述第一人声相关联的音频文件;
响应于对所述音频文件的触控操作,获取所述第一人声。
根据本公开一个或多个实施例,所述获取模块131具体用于:
响应于对所述录音控件的触控操作,录制用户发出的人声;
响应于对所述录音控件的触控操作,停止录制所述用户发出的人声,得到所述第一人声。
根据本公开一个或多个实施例,所述播放模块133具体用于:
显示所述第一人声的播放页面,所述播放页面包括所述一个或多个声部控件和播放控件;
响应于对所述一个或多个声部控件的触控操作,基于第一显示方式显示所述一个或多个声部控件;
响应于对所述播放控件的触控操作,播放所述目标音频,所述目标音频包括所述第一人声和所述一个或多个声部控件相关联的第二人声。
根据本公开一个或多个实施例,所述播放模块133还用于:
响应于对所述第一显示方式显示的第一声部控件的触控操作,基于第二显示方式显示所述第一声部控件,所述目标音频中不包括所述第一声部控件相关联的第二人声,所述第二显示方式的显示亮度低于所述第一显示方式的显示亮度。
本公开实施例提供的音频处理装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图14为本公开实施例提供的另一种音频处理装置的结构示意图。在图13所示的基础上,请参见图14,该音频处理装置130还包括响应模块134,所述响应模块134用于:
响应于将所述一个或多个音色控件中的目标音色控件拖拽到所述一个或多个声部控件中的第二声部控件的拖拽操作,将所述第二声部控件相关联的第二人声的音色调整为所述音色控件对应的音色;
基于所述拖拽操作,显示所述拖拽操作相关联的拖拽动画,所述拖拽动画包括所述音色控件的动画。
根据本公开一个或多个实施例,所述响应模块134具体用于:
响应于所述拖拽操作,显示所述音色控件的动画;
控制所述音色控件的动画跟随所述拖拽操作相关联的拖拽轨迹移动。
根据本公开一个或多个实施例,所述响应模块134具体用于:
响应于对所述声相控件的滑动操作,调整所述声部控件相关联的第二人声的声相。
根据本公开一个或多个实施例,所述生成模块132具体用于:
基于所述第一人声,确定所述第三声部相关联的和声旋律;
基于所述第一人声以及所述和声旋律,生成所述第三声部的第二人声。
根据本公开一个或多个实施例,所述生成模块132具体用于:
获取所述第一人声相关联的伴奏和弦;
基于所述第三声部,对所述伴奏和弦进行调整,得到所述第三声部的第二人声相关联的和声旋律。
根据本公开一个或多个实施例,所述生成模块132具体用于:
若所述和声旋律相关联的音高与所述第一人声的第一音高之间的音差小于或等于第一阈值,则将所述第一人声修音至所述和声旋律,得到所述第三声部的第二人声;
若所述和声旋律相关联的音高与所述第一音高之间的音差大于所述第一阈值,则将所述和声旋律相关联的音高平移至所述第一音高的位置,并将所述第一人声修音至所述和声旋律,得到所述第三声部的第二人声。
根据本公开一个或多个实施例,所述生成模块132具体用于:
将所述第一人声修音至所述和声旋律,得到待处理人声,所述待处理人声的音色与所述第一人声的音色相同;
获取预设音色,并将所述待处理人声的音色替换为所述预设音色,得到所述第三声部的第二人声。
根据本公开一个或多个实施例,所述声部包括低声部、中低声部、中声部、中高声部和高声部。
本公开实施例提供的音频处理装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机执行指令,当处理器执行该计算机执行指令时,使得该处理器执行如上述各个方法实施例所述方法。
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现如上述各个方法实施例所述方法。
本公开实施例还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上述各个方法实施例所述方法。
本公开提供一种音频处理方法、装置及电子设备,电子设备获取第一人声,基于第一人声,生成第一人声相关联的一个或多个声部的第二人声,其中,第二人声的音色为预设音色,基于第一人声和一个或多个声部的第二人声,播放目标音频,其中,目标音频包括第一人声和/或一个或多个声部的第二人声。在上述方法中,在用户演唱完第一人声时,电子设备可以生成第一人声相关联的多个声部的和声(第二人声),进而可以得到无伴奏演唱的音频,无需用户进行多次和声演唱,提高用户的体验,降低音频获取的复杂度。
图15为本公开实施例提供的一种电子设备的结构示意图。请参见图15,其示出了适于用来实现本公开实施例的电子设备1500的结构示意图,该电子设备1500可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图15示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图15所示,电子设备1500可以包括处理装置(例如中央处理器、图形处理器等)1501,其可以根据存储在只读存储器(Read Only Memory,简称ROM)1502中的程序或者从存储装 置1508加载到随机访问存储器(Random Access Memory,简称RAM)1503中的程序而执行各种适当的动作和处理。在RAM 1503中,还存储有电子设备1500操作所需的各种程序和数据。处理装置1501、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(Input/Output,简称I/O)接口1505也连接至总线1504。
通常,以下装置可以连接至I/O接口1505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1506;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置1507;包括例如磁带、硬盘等的存储装置1508;以及通信装置1509。通信装置1509可以允许电子设备1500与其他设备进行无线或有线通信以交换数据。虽然图15示出了具有各种装置的电子设备1500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1509从网络上被下载和安装,或者从存储装置1508被安装,或者从ROM 1502被安装。在该计算机程序被处理装置1501执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,简称EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read Only Memory,简称CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(Radio Frequency,射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用 户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、专用标准产品(Application Specific Standard Product,简称ASSP)、片上系统(System On Chip,简称SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,简称CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质 等软件或硬件提供个人信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。数据可以包括信息、参数和消息等,如切流指示信息。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (19)

  1. 一种音频处理方法,包括:
    获取第一人声;
    基于所述第一人声,生成所述第一人声相关联的一个或多个声部的第二人声,所述第二人声的音色为预设音色;
    基于所述第一人声和所述一个或多个声部的第二人声,播放目标音频,所述目标音频包括所述第一人声和/或所述一个或多个声部的第二人声。
  2. 根据权利要求1所述的方法,其中,所述获取第一人声,包括:
    显示音频获取页面,所述音频获取页面包括录音控件和/或音频导入控件;
    响应于对所述录音控件或所述音频导入控件的触控操作,获取所述第一人声。
  3. 根据权利要求2所述的方法,其特征在于,所述响应于对所述音频导入控件的触控操作,获取所述第一人声,包括:
    响应于对所述音频导入控件的触控操作,显示音频导入页面,所述音频导入页面中包括所述第一人声相关联的音频文件;
    响应于对所述音频文件的触控操作,获取所述第一人声。
  4. 根据权利要求2所述的方法,其中,所述响应于对所述录音控件的触控操作,获取所述第一人声,包括:
    响应于对所述录音控件的触控操作,录制用户发出的人声;
    响应于对所述录音控件的触控操作,停止录制所述用户发出的人声,得到所述第一人声。
  5. 根据权利要求1-4任一项所述的方法,其中,所述基于所述第一人声和所述一个或多个声部的第二人声,播放所述目标音频,包括:
    显示所述第一人声的播放页面,所述播放页面包括所述一个或多个声部控件和播放控件;
    响应于对所述一个或多个声部控件的触控操作,基于第一显示方式显示所述一个或多个声部控件;
    响应于对所述播放控件的触控操作,播放所述目标音频,所述目标音频包括所述第一人声和所述一个或多个声部控件相关联的第二人声。
  6. 根据权利要求5所述的方法,其中,所述播放所述目标音频时,所述方法还包括:
    响应于对所述第一显示方式显示的第一声部控件的触控操作,基于第二显示方式显示所述第一声部控件,所述目标音频中不包括所述第一声部控件相关联的第二人声,所述第二显示方式的显示亮度低于所述第一显示方式的显示亮度。
  7. 根据权利要求5所述的方法,其中,所述播放页面包括一个或多个音色控件;所述方法还包括:
    响应于将所述一个或多个音色控件中的目标音色控件拖拽到所述一个或多个声部控件中的第二声部控件的拖拽操作,将所述第二声部控件相关联的第二人声的音色调整为所述目标音色控件对应的音色;
    基于所述拖拽操作,显示所述拖拽操作相关联的拖拽动画,所述拖拽动画包括所述音色控件的动画。
  8. 根据权利要求7所述的方法,其中,所述基于所述拖拽操作,显示所述拖拽操作相关联的拖拽动画,包括:
    响应于所述拖拽操作,显示所述音色控件的动画;
    控制所述音色控件的动画跟随所述拖拽操作相关联的拖拽轨迹移动。
  9. 根据权利要求5所述的方法,其中,所述播放页面包括所述声部控件相关联的声相控件,所述方法还包括:
    响应于对所述声相控件的滑动操作,调整所述声部控件相关联的第二人声的声相。
  10. 根据权利要求1-9任一项所述的方法,其中,针对于所述多个声部中的任意一个第三声部;基于所述第一人声,生成所述第一人声相关联的第三声部的第二人声,包括:
    基于所述第一人声,确定所述第三声部相关联的和声旋律;
    基于所述第一人声以及所述和声旋律,生成所述第三声部的第二人声。
  11. 根据权利要求10所述的方法,其中,所述基于所述第一人声,确定所述第三声部的第二人声相关联的和声旋律,包括:
    获取所述第一人声相关联的伴奏和弦;
    基于所述第三声部,对所述伴奏和弦进行调整,得到所述第三声部的第二人声相关联的和声旋律。
  12. 根据权利要求10或11所述的方法,其中,所述基于所述第一人声和所述和声旋律,生成所述第三声部的第二人声,包括:
    若所述和声旋律相关联的音高与所述第一人声的第一音高之间的音差小于或等于第一阈值,则将所述第一人声修音至所述和声旋律,得到所述第三声部的第二人声;
    若所述和声旋律相关联的音高与所述第一音高之间的音差大于所述第一阈值,则将所述和声旋律相关联的音高平移至所述第一音高的位置,并将所述第一人声修音至所述和声旋律,得到所述第三声部的第二人声。
  13. 根据权利要求12所述的方法,其中,所述将所述第一人声修音至所述和声旋律,得到所述第三声部的第二人声,包括:
    将所述第一人声修音至所述和声旋律,得到待处理人声,所述待处理人声的音色与所述第一人声的音色相同;
    获取预设音色,并将所述待处理人声的音色替换为所述预设音色,得到所述第三声部的第二人声。
  14. 根据权利要求1-13任一项所述的方法,其中,所述声部包括低声部、中低声部、中声部、中高声部和高声部。
  15. 一种音频处理装置,包括获取模块、生成模块和播放模块,其中:
    所述获取模块用于,获取第一人声;
    所述生成模块用于,基于所述第一人声,生成所述第一人声相关联的一个或多个声部的第二人声,所述第二人声的音色为预设音色;
    所述播放模块用于,基于所述第一人声和所述一个或多个声部的第二人声,播放目标音频,所述目标音频包括所述第一人声和/或所述一个或多个声部的第二人声。
  16. 一种电子设备,包括:处理器和存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行如权利要求1-14任一项所述的音频处理方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1-14任一项所述的音频处理方法。
  18. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1-14任一项所述的音频处理方法。
  19. 一种计算机程序,所述计算机程序被处理器执行时实现权利要求1-14任一项所述的音频处理方法。
PCT/CN2023/113612 2022-09-26 2023-08-17 音频处理方法、装置及电子设备 WO2024066790A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211177547.X 2022-09-26
CN202211177547.XA CN117809686A (zh) 2022-09-26 2022-09-26 音频处理方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2024066790A1 true WO2024066790A1 (zh) 2024-04-04

Family

ID=90424054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113612 WO2024066790A1 (zh) 2022-09-26 2023-08-17 音频处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN117809686A (zh)
WO (1) WO2024066790A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08286689A (ja) * 1995-02-13 1996-11-01 Yamaha Corp 音声信号処理装置
CN1155138A (zh) * 1995-09-21 1997-07-23 兄弟工业株式会社 卡拉ok装置
CN1173006A (zh) * 1996-08-06 1998-02-11 雅马哈株式会社 通过单声道独立地产生多个合唱声部的音乐装置
WO2020177190A1 (zh) * 2019-03-01 2020-09-10 腾讯音乐娱乐科技(深圳)有限公司 一种处理方法、装置及设备
CN112331222A (zh) * 2020-09-23 2021-02-05 北京捷通华声科技股份有限公司 一种转换歌曲音色的方法、系统、设备及存储介质
CN112652318A (zh) * 2020-12-21 2021-04-13 北京捷通华声科技股份有限公司 音色转换方法、装置及电子设备
CN113035164A (zh) * 2021-02-24 2021-06-25 腾讯音乐娱乐科技(深圳)有限公司 歌声生成方法和装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08286689A (ja) * 1995-02-13 1996-11-01 Yamaha Corp 音声信号処理装置
CN1155138A (zh) * 1995-09-21 1997-07-23 兄弟工业株式会社 卡拉ok装置
CN1173006A (zh) * 1996-08-06 1998-02-11 雅马哈株式会社 通过单声道独立地产生多个合唱声部的音乐装置
WO2020177190A1 (zh) * 2019-03-01 2020-09-10 腾讯音乐娱乐科技(深圳)有限公司 一种处理方法、装置及设备
CN112331222A (zh) * 2020-09-23 2021-02-05 北京捷通华声科技股份有限公司 一种转换歌曲音色的方法、系统、设备及存储介质
CN112652318A (zh) * 2020-12-21 2021-04-13 北京捷通华声科技股份有限公司 音色转换方法、装置及电子设备
CN113035164A (zh) * 2021-02-24 2021-06-25 腾讯音乐娱乐科技(深圳)有限公司 歌声生成方法和装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN117809686A (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
US9214143B2 (en) Association of a note event characteristic
US9224378B2 (en) Systems and methods thereof for determining a virtual momentum based on user input
US20170330542A1 (en) Separate isolated and resonance samples for a virtual instrument
WO2020259130A1 (zh) 精选片段处理方法、装置、电子设备及可读介质
CN110324718A (zh) 音视频生成方法、装置、电子设备及可读介质
RU2729165C1 (ru) Динамическая модификация звукового контента
CN105706161A (zh) 基于音高分布的自动音频协调
CN110211556A (zh) 音乐文件的处理方法、装置、终端及存储介质
WO2024104181A1 (zh) 确定音频的方法、装置、电子设备及存储介质
WO2024078293A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2024067157A1 (zh) 生成特效视频的方法、装置、电子设备及存储介质
WO2024066790A1 (zh) 音频处理方法、装置及电子设备
WO2022143530A1 (zh) 音频处理方法、装置、计算机设备及存储介质
JP2013213907A (ja) 評価装置
US20140282004A1 (en) System and Methods for Recording and Managing Audio Recordings
CN109300459A (zh) 歌曲合唱方法及装置
CN115346503A (zh) 歌曲创作方法、歌曲创作装置、存储介质及电子设备
KR101020557B1 (ko) 사용자 창조형 음악 콘텐츠 제작을 위한 악보 생성 장치 및그 방법
JP5510207B2 (ja) 楽音編集装置及びプログラム
JP5969421B2 (ja) 楽器音出力装置及び楽器音出力プログラム
WO2024012257A1 (zh) 音频处理方法、装置及电子设备
WO2024082802A1 (zh) 音频处理方法、装置及终端设备
CN112825245A (zh) 实时修音方法、装置及电子设备
WO2023160713A1 (zh) 音乐生成方法、装置、设备、存储介质及程序
CN116403549A (zh) 音乐生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870035

Country of ref document: EP

Kind code of ref document: A1