WO2022227589A1 - 音频处理方法及装置 - Google Patents

音频处理方法及装置 Download PDF

Info

Publication number
WO2022227589A1
WO2022227589A1 PCT/CN2021/136890 CN2021136890W WO2022227589A1 WO 2022227589 A1 WO2022227589 A1 WO 2022227589A1 CN 2021136890 W CN2021136890 W CN 2021136890W WO 2022227589 A1 WO2022227589 A1 WO 2022227589A1
Authority
WO
WIPO (PCT)
Prior art keywords
pitch
audio
character
adjustment
lyrics
Prior art date
Application number
PCT/CN2021/136890
Other languages
English (en)
French (fr)
Inventor
张昆
马小坤
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022227589A1 publication Critical patent/WO2022227589A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range

Definitions

  • the present disclosure relates to the technical field of audio processing, and in particular, to an audio processing method and device.
  • the present disclosure provides an audio processing method and device, which can simplify the operation of the audio processing process and improve the audio processing efficiency.
  • the technical solutions of the present disclosure are as follows:
  • an audio processing method comprising: in a pitch adjustment interface of a first audio, displaying a character in the lyrics of the first audio and a pitch adjustment control of the character , the pitch adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio; in response to the adjustment operation of the pitch adjustment control of the character, the target pitch of the audio segment is determined; based on the adjustment After the audio segment, a second audio is generated.
  • the present disclosure can intuitively display the lyrics, and at the same time adjust the pitch of the audio segment corresponding to the character, without the need for professionals to read the music for adjustment, the learning cost is low, the convenience is quick, and the efficiency of audio processing is improved. .
  • an audio processing apparatus comprising: a display unit configured to display, in a pitch adjustment interface of the first audio, characters in the lyrics of the first audio and the A pitch adjustment control for a character, the pitch adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio; a determining unit, configured to respond to an adjustment operation of the character pitch adjustment control , determine the target pitch of the audio segment; the generating unit is configured to generate a second audio based on the adjusted audio segment.
  • a terminal comprising: one or more processors; a memory for storing program codes executable by the processors; wherein the processor is configured to execute the program code to implement the above audio processing method.
  • a computer-readable storage medium when program codes in the computer-readable storage medium are executed by a processor of a terminal, the terminal can execute the above audio processing method.
  • a computer program product including a computer program, which implements the above audio processing method when the computer program is executed by a processor.
  • FIG. 1 is a schematic diagram of an implementation environment of an audio processing method according to an exemplary embodiment
  • FIG. 2 is a flowchart of an audio processing method according to an exemplary embodiment
  • FIG. 3 is a flowchart of another audio processing method according to an exemplary embodiment
  • FIG. 4 is a schematic interface diagram of a pitch adjustment interface according to an exemplary embodiment
  • FIG. 5 is a block diagram of an audio processing apparatus according to an exemplary embodiment
  • FIG. 6 is a block diagram of a terminal 600 according to an exemplary embodiment.
  • the data (such as audio) involved in this disclosure is data authorized by the user or fully authorized by the parties.
  • FIG. 1 is a schematic diagram of an implementation environment of an audio processing method provided by an embodiment of the present disclosure. As shown in FIG. 1 , the implementation environment includes: a terminal 101 and a server 102 .
  • the terminal 101 is at least one of a smart phone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, and a laptop portable computer.
  • the communication means connects directly or indirectly with the server 102 .
  • the terminal 101 generally refers to one of multiple terminals, and only the terminal 101 is used as an example in this embodiment of the present disclosure. Those skilled in the art can know that, in more possible implementation manners, the number of the above-mentioned terminals is more or less.
  • the terminal 101 has various types of applications installed and running, for example, audio applications (such as karaoke applications, audio playback applications, audio clip applications, etc.).
  • audio applications such as karaoke applications, audio playback applications, audio clip applications, etc.
  • the terminal 101 acquires audio information of multiple audios (such as audio name, audio author, audio creation time, etc.) from the server 102, and then displays the acquired audio information, so that the user can select based on the audio information audio that interests you.
  • the terminal 101 sends an audio acquisition request to the server 102 in response to the user's triggering operation on any audio information, the audio acquisition request carries the audio identification corresponding to the audio information, and receives the first audio returned by the server 102 based on the audio identification, and then
  • the user can adjust the pitch of the audio segment included in the first audio through the terminal 101 to obtain the second audio, so that the user can sing based on the self-adjusted second audio.
  • the user can also upload the second audio to the server 102 through the terminal 101, so that other users can obtain the second audio from the server 102, so as to share the audio obtained after the user adjusts the pitch to other users .
  • the server 102 is an independent physical server, or the server 102 is a server cluster or a distributed file system composed of multiple physical servers, or the server 102 provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, and network services , cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the server 102 and the terminal 101 are directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present disclosure.
  • the server 102 is associated with an audio database for storing a plurality of audios and audio information of the plurality of audios.
  • the server 102 In response to receiving the audio information acquisition request from the terminal 101, the server 102 acquires audio information of multiple audios from the audio database, and then sends the acquired audio information to the terminal 101, so that the terminal 101 can display the received audio information.
  • the server 102 receives the audio acquisition request from the terminal 101 , further acquires the audio corresponding to the audio identifier carried in the audio acquisition request from the audio database, and returns the acquired audio to the terminal 101 .
  • the server 102 can also receive the audio uploaded by the terminal 101, and then store the received audio in the audio database, so that when other terminals request to acquire the audio uploaded by the terminal 101, the corresponding audio can be acquired from the audio database , and send it to other terminals.
  • the number of the foregoing servers 102 is more or less, which is not limited in this embodiment of the present disclosure.
  • the server 102 also includes other functional servers in order to provide more comprehensive and diverse services.
  • FIG. 2 is a flowchart of an audio processing method according to an exemplary embodiment. As shown in FIG. 2 , executed by a terminal, the audio processing method includes the following steps:
  • step 201 on the pitch adjustment interface of the first audio, the terminal displays a character in the lyrics of the first audio and a pitch adjustment control of the character, and the pitch adjustment control is used to adjust the character in the first audio The pitch of the corresponding audio clip in the audio.
  • step 202 the terminal determines the target pitch of the audio segment in response to the adjustment operation on the pitch adjustment control of the character.
  • step 203 the terminal generates a second audio based on the adjusted audio segment.
  • FIG. 3 is a flow chart of another audio processing method according to an exemplary embodiment.
  • step 301 the terminal acquires the lyrics of the first audio, where the lyrics include multiple characters and a time tag corresponding to each character.
  • the first audio is audio prepared in advance by a professional.
  • the first audio and the lyrics file and score file corresponding to the first audio are all stored in the audio database associated with the server.
  • the terminal sends an audio acquisition request to the server to acquire the first audio. Audio and the lyrics file and score file corresponding to the first audio.
  • the audio acquisition request carries the audio identification of the first audio
  • the server in response to the audio acquisition request of the terminal, acquires the first audio corresponding to the audio identification, as well as the lyrics file and score file corresponding to the first audio from the audio database, and then The acquired first audio and the lyrics file and score file corresponding to the first audio are sent to the terminal.
  • the terminal obtains the lyrics of the first audio from the lyrics file of the first audio.
  • the acquired lyrics of the first audio includes a plurality of characters, and each character is set with a corresponding time tag.
  • the lyrics of the first audio may also include one character, which is not limited in this embodiment of the present application.
  • step 302 the terminal obtains the pitch corresponding to the time tag corresponding to the character from the score file of the first audio.
  • the character is any character in the lyrics, and the embodiment of the present application only takes the character as an example for description, and the processing process for other characters in the lyrics is the same, which is not repeated in the embodiment of the present application.
  • the terminal obtains the pitch corresponding to each audio segment in the first audio from the score file of the first audio, each audio segment is an audio segment corresponding to a character, and each audio segment is set with The corresponding time label is further combined with the time label corresponding to each character obtained in step 301 to determine the pitch corresponding to each character, so as to obtain the corresponding pitch of the character.
  • step 303 the terminal displays the characters in the lyrics in the character display area of the pitch adjustment interface of the first audio.
  • the pitch adjustment interface includes a character display area for displaying characters in the lyrics.
  • FIG. 4 is a schematic interface diagram of a pitch adjustment interface according to an exemplary embodiment.
  • area 401 is the character display area.
  • the pitch adjustment is performed first.
  • the character display area of the interface displays some characters in the lyrics, and provides a sliding function in the character display area, so that the user can perform sliding operations through the characters displayed in the character display area, so that the terminal can respond to the displayed characters.
  • Character sliding operation the remaining characters in the lyrics are displayed in the character display area.
  • the character display area may not be able to completely display all the characters included in the lyrics, some characters are displayed first, and the sliding function is set in the area where the characters are displayed. It is displayed in the character display area, and then the pitch of the audio segment corresponding to the remaining characters is adjusted, so as to realize the processing of the whole audio.
  • the audio adjustment interface further includes a lyrics display area, and the lyrics display area is used to display the lyrics corresponding to the characters displayed in the character display area.
  • the terminal displays the target lyrics in the lyrics in the lyrics display area, and the target lyrics are any lyrics in the lyrics; and in the pitch adjustment interface
  • the character display area of displays part of the characters of the target lyrics.
  • the terminal displays the remaining characters of the target lyrics in the character display area in response to the sliding operation on the displayed characters.
  • the area 403 is the lyrics display area
  • the lyrics displayed in the lyrics display area are:
  • the lyrics to which the characters displayed in the character display area belong that is, the five characters "I", “Ba”, “Month”, “Bright” and “Send” displayed in the character display area are all lyrics displayed. Characters from the lyrics "I'll give you the moon" displayed in the area.
  • the terminal can respond to the user's sliding operation, and the remaining two “give” and “you” in “I will give you the moon” or, in response to the user's sliding operation, the terminal displays the remaining five characters of "month”, “liang”, “send”, “give” and “you” in “I will give you the moon” display, so as to ensure that the characters displayed in the character display area each time meet the maximum number of characters that can be displayed in the character display area, thereby improving the display effect.
  • the pitch adjustment interface further includes a first lyric switching control and a second lyric switching control, the first lyric switching control is used for switching to the previous lyric, and the second switching control is used for switching to the next lyric .
  • a first lyrics switching control and a second lyrics switching control are provided in the lyrics display area of the pitch adjustment interface.
  • the first lyrics switching control and the second lyrics switching control are provided in the pitch adjustment interface, so that the user can switch the currently displayed lyrics through these two lyrics switching controls, so as to process other lyrics, and then realize the Processing of all lyrics in the first audio.
  • the first lyrics switching control 411 and the first lyrics switching control 411 are provided in the area 402 as the lyrics display area.
  • Two lyrics switching controls 412 are provided in the pitch adjustment interface shown in Fig. 4 in the area 402 as the lyrics display area.
  • the terminal responds to the user's triggering operation on the first switching control to switch the lyrics currently displayed on the pitch adjustment interface to the previous lyrics. For example, in response to the user's triggering operation on the first switching control, the terminal switches the target lyric currently displayed in the lyric display area of the pitch adjustment interface to the previous lyric of the target lyric.
  • the characters displayed in the character display area will also be correspondingly switched to some characters in the previous lyric.
  • the terminal responds to the user's triggering of the second switching control operation to switch the lyrics currently displayed on the pitch adjustment interface to the next lyrics. For example, in response to the user's triggering operation on the second switching control, the terminal switches the target lyric currently displayed in the lyric display area of the pitch adjustment interface to the next lyric of the target lyric.
  • the characters displayed in the character display area will also be correspondingly switched to the characters in the next lyric.
  • step 304 the terminal displays the pitch adjustment control of the character based on the pitch of the audio clip in the area corresponding to the character, and the display position of the pitch adjustment control corresponds to the pitch of the audio clip, and the pitch The adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio.
  • the pitch adjustment interface includes a control display area for displaying a pitch adjustment control for the displayed character based on the pitch of the audio segment corresponding to the displayed character.
  • the control display area is further divided into a plurality of sub-areas, each sub-area is used to display an audio adjustment control, and each small area corresponds to a character.
  • the area 403 is the control display area, and the control display area displays the controls, One-to-one correspondence with the characters displayed in the character display area.
  • the lyrics are displayed in the lyrics display area, the characters are displayed in the character display area, and the pitch adjustment controls are displayed in the area corresponding to the characters to improve the display.
  • the effect is convenient for users to use, thereby improving the user experience.
  • the pitch adjustment controls displayed in the control display area only include “me”, “ba”, “month”, “bright”, “send” "The five pitch adjustment controls corresponding to these five characters, that is, the pitch adjustment control 413 for the character “I”, the pitch adjustment control 414 for the character “Ba”, and the pitch adjustment control 415 for the character “ ⁇ ” , a pitch adjustment control 416 for the character “bright”, and a pitch adjustment control 417 for the character "send”.
  • step 303 some characters in the lyrics are displayed in the character display area of the pitch adjustment interface, then correspondingly, the pitch adjustment controls displayed in the control display area are only Pitch adjustment controls for this part of the character.
  • the display position of the pitch adjustment control displayed in the pitch adjustment interface will also be updated accordingly. The position corresponding to the pitch of the audio segment corresponding to the displayed character.
  • the display position of the pitch adjustment control in the pitch adjustment interface will also be updated with the switching of the lyrics. Adjusts the updated display position of the controls, corresponding to the pitch of the audio clip corresponding to the character in the switched lyrics.
  • the terminal displays a bar graph corresponding to the character below the pitch adjustment control of the character, and the height of the bar graph corresponds to the pitch of the audio segment.
  • a corresponding bar graph is displayed below each pitch adjustment control, and each bar graph is displayed.
  • the height of is determined by the pitch of the audio segment corresponding to the corresponding character.
  • the display form in the audio processing process can be enriched.
  • the terminal displays the first roll call in the area corresponding to the character in the pitch adjustment interface of the first audio, where the first roll call corresponds to the pitch of the audio segment.
  • the roll name refers to the name used for the convenience of singing when singing the melody.
  • the roll name includes “do”, “re”, “mi”, “fa”, “sol”, “la”, “si” ".
  • the user can directly determine the pitch of the audio clip through the displayed roll call, which improves the display effect and improves the user experience.
  • step 303 is followed by step 302. After step 302 is performed, step 304 is performed. Alternatively, the above steps 301 to 304 may be performed in another order, which is not limited in this embodiment of the present disclosure.
  • step 305 the terminal determines the target pitch of the audio segment in response to the adjustment operation on the pitch adjustment control of the character.
  • the user adjusts the pitch adjustment control by using a sliding operation on the pitch adjustment control, or by performing a triggering operation in a region corresponding to a character, or more optional manner, which is not limited in the embodiments of the present disclosure.
  • the user adjusts the pitch of the audio segment by sliding the pitch adjustment control of the character
  • the terminal responds to the sliding operation of the pitch adjustment control of the character, and adjusts the target position of the sliding operation.
  • the corresponding pitch is determined as the target pitch.
  • the user adjusts the pitch of the audio clip by performing a trigger operation in the area corresponding to the character, and the terminal responds to the trigger operation in the area corresponding to the character, and the target position to be triggered The corresponding pitch is determined as the target pitch.
  • the user-selectable operation forms are increased, and the flexibility of the user's operation process is improved.
  • the terminal When the user performs an adjustment operation on the pitch adjustment control, the terminal displays the pitch adjustment range in response to the adjustment operation on the pitch adjustment control of the character, so that the user can adjust the pitch adjustment control based on the displayed pitch adjustment range. position to adjust.
  • the pitch adjustment range is the corresponding whole-tone and semi-tone pitches of the seven roll-calls "do”, “re”, “mi”, “fa”, “sol”, “la”, and “si", That is, the pitch adjustment range includes 14 selectable pitches.
  • the operable pitch adjustment range is displayed, so that the user can perform the adjustment operation based on the displayed pitch adjustment range, thereby reducing the occurrence of the adjustment exceeding the range.
  • the terminal displays prompt information in response to the adjustment operation exceeding the pitch adjustment range, and the prompt information is used to prompt The adjustment operation is outside the pitch adjustment range.
  • the user When the user's adjustment operation exceeds the pitch adjustment range, the user is prompted through prompt information in time, so that the user can deal with the situation that the adjustment exceeds the range in time.
  • the pitch adjustment range includes a minimum pitch and a maximum pitch. In some embodiments, if the pitch corresponding to the target position of the adjustment operation is smaller than the minimum pitch, the minimum pitch is determined as the target pitch. In other embodiments, if the pitch corresponding to the target position of the adjustment operation is greater than the maximum pitch, the maximum pitch is determined as the target pitch.
  • the minimum pitch or maximum pitch is directly determined as the target pitch, so that the user can continue on the basis of the minimum pitch or maximum pitch Adjustment without re-adjustment from the most original pitch reduces the user's operating cost and improves the efficiency of the audio processing process.
  • the terminal can play the audio based on the pitch corresponding to the target position of the adjustment operation in response to the adjustment operation performed on the pitch adjustment control of the character. Fragment. That is, every time the user performs an adjustment operation on the pitch adjustment control, the terminal can automatically play the audio clip based on the pitch corresponding to the target position of the adjustment operation.
  • the duration of playing the audio clip is 0.3 seconds (s), or the duration of playing the audio clip is other values, which are not limited in this embodiment of the present disclosure.
  • the audio clip is played based on the pitch corresponding to the adjusted target position, so that the user can know the effect of the audio clip in time, so that the user can further process the audio clip.
  • the roll call of the pitch corresponding to the audio clip may change.
  • the roll call note corresponding to the pitch corresponding to the target position of the adjustment operation For the second roll call, if the first roll call currently displayed in the area corresponding to the character is different from the second roll call, the terminal updates the first roll call displayed in the area corresponding to the character to the second roll call.
  • the pitch adjustment interface further includes a playback control
  • the user can listen to the first audio segment corresponding to the lyrics currently displayed on the pitch adjustment interface by triggering the playback control.
  • the terminal plays the first audio segment based on the target pitch of the first audio segment, where the first audio segment corresponds to the lyrics currently displayed on the pitch adjustment interface.
  • the play control is provided as a play button 418, and the user can trigger the play button 418 to The adjusted first audio clip is played.
  • the user can use the playback control to play the first audio segment based on the pitch of the first audio segment corresponding to the currently displayed lyrics, so that the user can play each lyric
  • the adjusted audio segment is previewed, and then subsequent processing is performed based on the preview result.
  • the terminal when playing the first audio clip, the terminal highlights the currently played character; or, when playing the first audio clip, the terminal highlights the pitch adjustment control of the currently played character; or, when playing the first audio clip During the first audio clip, the terminal highlights the currently played character and the pitch adjustment control of the currently played character, and the embodiment of the present disclosure does not limit which method is specifically adopted.
  • the currently played character or the pitch adjustment control of the currently played character is highlighted, or both are highlighted simultaneously, so that the user can clearly know the current playing which character is reached, and further processing based on the effect of the audio being played.
  • step 306 the terminal generates a second audio based on the adjusted audio segment.
  • the pitch adjustment interface further includes a save control, and the user can trigger the save control to store the pitches corresponding to the audio clips corresponding to each character in the lyrics, and then based on the stored pitch, the An audio score file is updated to obtain an updated score file, so that a second audio is generated based on the updated score file, and the second audio is the audio obtained by adjusting the pitch of the audio segment.
  • the save control is provided as a save button 419 .
  • the pitch of the audio segment corresponding to the character can be adjusted while the lyrics are displayed intuitively, without the need for professionals to read the music for adjustment, the learning cost is low, and it is convenient and fast. Improves the efficiency of audio processing.
  • the user can adjust the pitch of the audio clip on the terminal, which makes the audio processing more convenient, and also provides the user with the function of real-time audition of the adjustment effect, thereby improving the user experience. experience.
  • the user adjusts the pitch of the audio clip, he can adjust the pitch of the audio clip corresponding to each character, which improves the flexibility of the audio processing process, so that the adjusted audio can meet the user's personalized requirements. musical aesthetics.
  • Fig. 5 is a block diagram of an audio processing apparatus according to an exemplary embodiment.
  • the device includes:
  • the display unit 501 is configured to display, in the pitch adjustment interface of the first audio, a character in the lyrics of the first audio and a pitch adjustment control of the character, and the pitch adjustment control is used to adjust the character in the first audio. a pitch of the corresponding audio segment in the audio;
  • determining unit 502 configured to determine the target pitch of the audio segment in response to the adjustment operation of the pitch adjustment control of the character
  • the generating unit 503 is configured to generate a second audio based on the adjusted audio segment.
  • the device provided by the embodiment of the present disclosure by providing a pitch adjustment interface, can intuitively display the lyrics and at the same time adjust the pitch of the audio segment corresponding to the character, without the need for professionals to read the music for adjustment, the learning cost is low, and it is convenient and quick , which improves the efficiency of audio processing.
  • the display unit 501 is configured to display the characters in the lyrics in the character display area of the pitch adjustment interface
  • the display unit 501 is further configured to display a pitch adjustment control of the character based on the pitch of the audio segment in the area corresponding to the character.
  • the display position of the pitch adjustment control corresponds to the pitch of the audio segment.
  • the display unit 501 is further configured to display a bar graph corresponding to the character below the pitch adjustment control of the character, and the height of the bar graph corresponds to the pitch of the audio segment.
  • the apparatus further includes:
  • an acquisition unit configured to acquire the lyrics of the first audio, and the lyrics include characters and time tags corresponding to the characters;
  • the obtaining unit is further configured to obtain the pitch corresponding to the time label corresponding to the character from the musical score file of the first audio.
  • the display unit 501 is configured to display some characters in the lyrics in the character display area of the pitch adjustment interface; in response to a sliding operation on the displayed characters, display the character display area in the character display area. The remaining characters in the lyrics.
  • the determining unit 502 is configured to, in response to the sliding operation of the pitch adjustment control of the character, determine the pitch corresponding to the target position of the sliding operation as the target pitch;
  • the determining unit 502 is further configured to, in response to the triggering operation in the region corresponding to the character, determine the pitch corresponding to the triggered target position as the target pitch.
  • the apparatus further includes:
  • the first playing unit is configured to, in response to performing an adjustment operation on the pitch adjustment control of the character, play the audio segment based on the pitch corresponding to the target position of the adjustment operation.
  • the display unit 501 is further configured to display a first roll call in the area corresponding to the character in the pitch adjustment interface of the first audio, where the first roll call corresponds to the sound of the audio segment high;
  • the device also includes:
  • an update unit configured to update the first roll call displayed in the area corresponding to the character to the second roll call if the first roll call is different from the second roll call, and the second roll call corresponds to the adjustment The pitch corresponding to the target position of the operation.
  • the display unit 501 is further configured to display the pitch adjustment range in response to the adjustment operation of the pitch adjustment control of the character.
  • the display unit 501 is further configured to display prompt information in response to the adjustment operation exceeding the pitch adjustment range, where the prompt information is used to prompt that the adjustment operation exceeds the pitch adjustment range.
  • the pitch adjustment range includes a minimum pitch and a maximum pitch
  • the determining unit 502 is further configured to determine the minimum pitch as the target pitch if the pitch corresponding to the target position of the adjustment operation is smaller than the minimum pitch;
  • the determining unit 502 is further configured to determine the maximum pitch as the target pitch if the pitch corresponding to the target position of the adjustment operation is greater than the maximum pitch.
  • the pitch adjustment interface further includes a first lyrics switching control and a second lyrics switching control
  • the device also includes:
  • a switching unit configured to switch the lyrics currently displayed on the pitch adjustment interface to the previous lyrics in response to the triggering operation of the first lyrics switching control
  • the switching unit is further configured to switch the lyrics currently displayed on the pitch adjustment interface to the lyrics of the next sentence in response to the triggering operation of the second lyrics switching control.
  • the pitch adjustment interface further includes playback controls
  • the device also includes:
  • the second playing unit is configured to play the first audio clip based on the target pitch of the first audio clip in response to the triggering operation of the playing control, and the first audio clip corresponds to the current display on the pitch adjustment interface s lyrics.
  • the display unit 501 is further configured to highlight the currently played character when playing the first audio segment; and/or,
  • the display unit 501 is further configured to highlight the pitch adjustment control of the currently played character when the first audio segment is played.
  • the audio processing apparatus processes audio
  • only the division of the above functional modules is used as an example for illustration.
  • the above functions can be allocated by different functional modules as required. , that is, dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the audio processing apparatus and the audio processing method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 6 shows a structural block diagram of a terminal 600 provided by an exemplary embodiment of the present disclosure.
  • the terminal 600 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio level 3 of moving picture expert compression), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer IV) Level 4) Player, laptop or desktop computer.
  • Terminal 600 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 600 includes: a processor 601 and a memory 602 .
  • the processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 601 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 601 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 601 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 601 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 602 may include one or more computer-readable storage media, which may be non-transitory. Memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 602 is used to store at least one program code, and the at least one program code is used to be executed by the processor 601 to implement the methods provided by the method embodiments of the present disclosure. audio processing method.
  • the terminal 600 may optionally further include: a peripheral device interface 603 and at least one peripheral device.
  • the processor 601, the memory 602 and the peripheral device interface 603 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 603 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 604 , a display screen 605 , a camera assembly 606 , an audio circuit 607 , a positioning assembly 608 and a power supply 609 .
  • the peripheral device interface 603 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 601 and the memory 602.
  • processor 601, memory 602, and peripherals interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one of processor 601, memory 602, and peripherals interface 603 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 604 communicates with the communication network and other communication devices via electromagnetic signals.
  • the radio frequency circuit 604 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • radio frequency circuitry 604 includes: an antenna system, an RF transceiver, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and the like.
  • the radio frequency circuit 604 may communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocols include, but are not limited to, metropolitan area networks, mobile communication networks of various generations (2G, 3G, 4G and 5G), wireless local area networks and/or WiFi (Wireless Fidelity, wireless fidelity) networks.
  • the radio frequency circuit 604 may further include a circuit related to NFC (Near Field Communication, short-range wireless communication), which is not limited in the present disclosure.
  • the display screen 605 is used for displaying UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 605 also has the ability to acquire touch signals on or above the surface of the display screen 605 .
  • the touch signal may be input to the processor 601 as a control signal for processing.
  • the display screen 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 605 there may be one display screen 605, which is arranged on the front panel of the terminal 600; in other embodiments, there may be at least two display screens 605, which are respectively arranged on different surfaces of the terminal 600 or in a folded design; In other embodiments, the display screen 605 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 600 . Even, the display screen 605 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 605 can be made of materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
  • the camera assembly 606 is used to capture images or video.
  • camera assembly 606 includes a front-facing camera and a rear-facing camera.
  • the front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal.
  • there are at least two rear cameras which are any one of a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera It is integrated with the wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting functions or other integrated shooting functions.
  • camera assembly 606 may also include a flash.
  • the flash can be a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • Audio circuitry 607 may include a microphone and speakers.
  • the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals and input them to the processor 601 for processing, or to the radio frequency circuit 604 to realize voice communication.
  • the microphone may also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 601 or the radio frequency circuit 604 into sound waves.
  • the loudspeaker can be a traditional thin-film loudspeaker or a piezoelectric ceramic loudspeaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible to humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes.
  • the audio circuit 607 may also include a headphone jack.
  • the positioning component 608 is used to locate the current geographic location of the terminal 600 to implement navigation or LBS (Location Based Service).
  • the positioning component 608 may be a positioning component based on the GPS (Global Positioning System, global positioning system) of the United States, the Beidou system of China, the Grenas system of Russia, or the Galileo system of the European Union.
  • the power supply 609 is used to power various components in the terminal 600 .
  • the power source 609 may be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery can support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • terminal 600 also includes one or more sensors 610 .
  • the one or more sensors 610 include, but are not limited to, an acceleration sensor 611 , a gyro sensor 612 , a pressure sensor 613 , a fingerprint sensor 614 , an optical sensor 615 and a proximity sensor 616 .
  • the acceleration sensor 611 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 600 .
  • the acceleration sensor 611 can be used to detect the components of the gravitational acceleration on the three coordinate axes.
  • the processor 601 can control the display screen 605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611 .
  • the acceleration sensor 611 can also be used for game or user movement data collection.
  • the gyroscope sensor 612 can detect the body direction and rotation angle of the terminal 600 , and the gyroscope sensor 612 can cooperate with the acceleration sensor 611 to collect 3D actions of the user on the terminal 600 .
  • the processor 601 can implement the following functions according to the data collected by the gyro sensor 612 : motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 613 may be disposed on the side frame of the terminal 600 and/or the lower layer of the display screen 605 .
  • the processor 601 can perform left and right hand identification or quick operation according to the holding signal collected by the pressure sensor 613.
  • the processor 601 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 605.
  • the operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 614 is used to collect the user's fingerprint, and the processor 601 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the user's identity according to the collected fingerprint. When the user's identity is identified as a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 614 may be disposed on the front, back or side of the terminal 600 . When the terminal 600 is provided with physical buttons or a manufacturer's logo, the fingerprint sensor 614 may be integrated with the physical buttons or the manufacturer's logo.
  • Optical sensor 615 is used to collect ambient light intensity.
  • the processor 601 may control the display brightness of the display screen 605 according to the ambient light intensity collected by the optical sensor 615 . Specifically, when the ambient light intensity is high, the display brightness of the display screen 605 is increased; when the ambient light intensity is low, the display brightness of the display screen 605 is decreased.
  • the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615 .
  • a proximity sensor 616 also called a distance sensor, is usually provided on the front panel of the terminal 600 .
  • the proximity sensor 616 is used to collect the distance between the user and the front of the terminal 600 .
  • the processor 601 controls the display screen 605 to switch from the bright screen state to the off screen state; when the proximity sensor 616 detects When the distance between the user and the front of the terminal 600 gradually increases, the processor 601 controls the display screen 605 to switch from the closed screen state to the bright screen state.
  • FIG. 6 does not constitute a limitation on the terminal 600, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • a terminal comprising: one or more processors; a memory for storing program code executable by the one or more processors; wherein the one or more processors are The program code is configured to execute the following steps:
  • the pitch adjustment interface of the first audio characters in the lyrics of the first audio and a pitch adjustment control of the character are displayed, and the pitch adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio;
  • a second audio is generated.
  • the one or more processors are further configured to execute program codes to implement steps in the audio processing methods provided in other embodiments of the above-mentioned method.
  • a computer-readable storage medium including program codes is also provided, such as a memory 602 including program codes, and the above-mentioned program codes can be executed by the processor 601 of the terminal 600 to complete the following steps:
  • the pitch adjustment interface of the first audio characters in the lyrics of the first audio and a pitch adjustment control of the character are displayed, and the pitch adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio;
  • a second audio is generated.
  • the above program codes can be executed by the processor 601 of the terminal 600 to complete the steps in the audio processing methods provided in other embodiments of the above method embodiments.
  • the computer-readable storage medium is ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact-Disc Read-Only Memory, read-only memory) optical disks), magnetic tapes, floppy disks, and optical data storage devices, etc.
  • a computer program product including a computer program, which implements the following steps when executed by the processor 601 of the terminal 600:
  • the pitch adjustment interface of the first audio characters in the lyrics of the first audio and a pitch adjustment control of the character are displayed, and the pitch adjustment control is used to adjust the pitch of the audio segment corresponding to the character in the first audio;
  • a second audio is generated.
  • the computer program involved in the embodiments of the present application may be deployed and executed on one terminal, or executed on multiple terminals located in one location, or distributed in multiple locations and interacting with each other through a communication network. It is executed on multiple connected terminals, and multiple terminals distributed in multiple locations and interconnected through a communication network can form a blockchain system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种音频处理方法及装置,属于音频处理技术领域,该方法包括:终端在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音高(201);终端响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高(202);基于调整后的所述音频片段,生成第二音频(203)。该方法通过提供音高调整界面,能够在直观展示歌词的同时,对字符对应的音频片段的音高来进行调整,无需专业人员看谱调整,学习成本较低,方便快捷,提高了音频处理的效率。

Description

音频处理方法及装置
本公开基于申请日为2021年4月28日、申请号为202110470416.X的中国专利申请,并要求该中国专利申请的优先权,其全部内容通过引用结合在本公开中作为参考。
技术领域
本公开涉及音频处理技术领域,尤其涉及一种音频处理方法及装置。
背景技术
随着计算机技术的不断发展,人们开始通过电子设备来录制自己演唱的歌曲,并将录制好的歌曲分享到多种平台,如流媒体平台、K歌平台或社交平台等,以满足自己展现歌唱水平或与他人进行K歌的娱乐需求。在录制歌曲时通常会用到伴奏。
发明内容
本公开提供一种音频处理方法及装置,能够简化音频处理过程的操作,提高音频处理效率。本公开的技术方案如下:
根据本公开实施例的第一方面,提供一种音频处理方法,该方法包括:在第一音频的音高调整界面中,显示该第一音频的歌词中的字符和该字符的音高调整控件,该音高调整控件用于调整该字符在该第一音频中对应的音频片段的音高;响应于对该字符的音高调整控件的调整操作,确定该音频片段的目标音高;基于调整后的该音频片段,生成第二音频。
本公开通过提供音高调整界面,能够在直观展示歌词的同时,对字符对应的音频片段的音高进行调整,无需专业人员看谱调整,学习成本较低,方便快捷,提高了音频处理的效率。
根据本公开实施例的第二方面,提供一种音频处理装置,该装置包括:显示单元,被配置为在第一音频的音高调整界面中,显示该第一音频的歌词中的字符和该字符的音高调整控件,该音高调整控件用于调整该字符在该第一音频中对应的音频片段的音高;确定单元,被配置为响应于对该字符的音高调整控件的调整操作,确定该音频片段的目标音高;生成单元,被配置为基于调整后的音频片段,生成第二音频。
根据本公开实施例的第三方面,提供一种终端,该终端包括:一个或多个处理器;用 于存储该处理器可执行程序代码的存储器;其中,该处理器被配置为执行该程序代码,以实现上述音频处理方法。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,当该计算机可读存储介质中的程序代码由终端的处理器执行时,使得终端能够执行上述音频处理方法。
根据本公开实施例的第五方面,提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述音频处理方法。
附图说明
图1是根据一示例性实施例示出的一种音频处理方法的实施环境示意图;
图2是根据一示例性实施例示出的一种音频处理方法的流程图;
图3是根据一示例性实施例示出的另一种音频处理方法的流程图;
图4是根据一示例性实施例示出的一种音高调整界面的界面示意图;
图5是根据一示例性实施例示出的一种音频处理装置的框图;
图6是根据一示例性实施例示出的一种终端600的框图。
具体实施方式
本公开所涉及的数据(如音频)为经用户授权或者经过各方充分授权的数据。
图1是本公开实施例提供的一种音频处理方法的实施环境示意图,如图1所示,该实施环境中包括:终端101和服务器102。
终端101为智能手机、智能手表、台式电脑、手提电脑、虚拟现实终端、增强现实终端、无线终端和膝上型便携计算机等设备中的至少一种,终端101具有通信功能,能够通过有线或无线通信方式与服务器102进行直接或间接的连接。终端101泛指多个终端中的一个,本公开实施例仅以终端101来举例说明。本领域技术人员能够知晓,在更多可能的实现方式中,上述终端的数量更多或更少。
在一些实施例中,终端101安装并运行有各种不同类型的应用程序,例如,音频类应用程序(如K歌应用程序、音频播放类应用程序、音频剪辑类应用程序等)。
在一些实施例中,终端101从服务器102处获取多个音频的音频信息(如音频名称、音频作者、音频创作时间等),进而对获取到的音频信息进行展示,以便用户基于音频信息来选择自己感兴趣的音频。终端101响应于用户对任一音频信息的触发操作,向服务器102发送音频获取请求,该音频获取请求携带该音频信息对应的音频标识,并接收服务器102基于该音频标识返回的第一音频,之后用户即可通过终端101对第一音频中所包括的 音频片段的音高进行调整,从而得到第二音频,以便用户基于自行调整得到的第二音频来进行演唱。在一些实施例中,用户还能够通过终端101将第二音频上传至服务器102,以便其他用户能够从服务器102处获取到该第二音频,从而将用户调整音高后得到的音频分享给其他用户。
服务器102是独立的物理服务器,或者,服务器102是多个物理服务器构成的服务器集群或者分布式文件系统,或者,服务器102是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。在一些实施例中,服务器102与终端101通过有线或无线通信方式进行直接或间接的连接,本公开实施例对此不作限定。服务器102关联有音频数据库,该音频数据库用于存储多个音频以及这多个音频的音频信息。服务器102响应于接收到终端101的音频信息获取请求,从音频数据库中获取多个音频的音频信息,进而将获取到的音频信息发送给终端101,以便终端101对接收到的音频信息进行展示。服务器102接收终端101的音频获取请求,进而从音频数据库中获取该音频获取请求所携带的音频标识对应的音频,并将获取到的音频返回给终端101。在一些实施例中,服务器102还能够接收终端101上传的音频,进而将接收到的音频存储至音频数据库,以便在其他终端请求获取终端101上传的音频时,从音频数据库中获取到相应的音频,并发送给其他终端。在一些实施例中,上述服务器102的数量更多或更少,本公开实施例对此不加以限定。当然,服务器102还包括其他功能服务器,以便提供更全面且多样化的服务。
图2是根据一示例性实施例示出的一种音频处理方法的流程图,如图2所示,由终端执行,该音频处理方法包括以下步骤:
在步骤201中,终端在第一音频的音高调整界面中,显示该第一音频的歌词中的字符和该字符的音高调整控件,该音高调整控件用于调整该字符在该第一音频中对应的音频片段的音高。
在步骤202中,终端响应于对字符的音高调整控件的调整操作,确定音频片段的目标音高。
在步骤203中,终端基于调整后的音频片段,生成第二音频。
本公开实施例提供的技术方案,通过提供音高调整界面,能够在直观展示歌词的同时,对字符对应的音频片段的音高进行调整,无需专业人员看谱调整,学习成本较低,方便快 捷,提高了音频处理的效率。
上述图2所示仅为本公开的基本流程,下面基于一种具体实施方式,来对本公开提供的方案进行进一步阐述,图3是根据一示例性实施例示出的另一种音频处理方法的流程图,如图3所示,该方法由终端执行,该音频处理方法包括以下步骤:
在步骤301中,终端获取第一音频的歌词,该歌词包括多个字符以及每个字符对应的时间标签。
其中,该第一音频为专业人员提前制作好的音频。第一音频以及第一音频对应的歌词文件和曲谱文件,均存储在服务器所关联的音频数据库中,当用户在终端上触发音频获取请求时,终端向服务器发送音频获取请求,以获取该第一音频以及第一音频对应的歌词文件和曲谱文件。该音频获取请求携带第一音频的音频标识,服务器响应于终端的音频获取请求,从音频数据库中获取该音频标识对应的第一音频,以及该第一音频对应的歌词文件和曲谱文件,进而将获取到的第一音频以及第一音频对应的歌词文件和曲谱文件发送给终端。
在一些实施例中,终端从第一音频的歌词文件中,获取第一音频的歌词。其中,获取到的第一音频的歌词包括多个字符,且每个字符均设置有对应的时间标签。另外该第一音频的歌词也可以包括一个字符,本申请实施例对此不做限定。
在步骤302中,终端从该第一音频的曲谱文件中,获取与字符对应的时间标签对应的音高。
其中,该字符为该歌词中的任一个字符,本申请实施例仅是以该字符为例进行说明,而对歌词中的其他字符的处理过程同理,本申请实施例中不再赘述。
在一些实施例中,终端从第一音频的曲谱文件中,获取第一音频中各个音频片段对应的音高,每个音频片段都是一个字符对应的音频片段,且每个音频片段均设置有对应的时间标签,进而结合步骤301中获取到的每个字符对应的时间标签,确定每个字符对应的音高,以得到该字符对应的音高。
其中,在基于每个音频片段对应的时间标签以及每个字符对应的时间标签,确定每个字符对应的音高时,确定时间标签一致的音频片段和字符,进而将该音频片段的音高,确定为与该音频片段的时间标签一致的字符对应的音高。
在步骤303中,终端在该第一音频的音高调整界面的字符显示区域,显示该歌词中的字符。
在一些实施例中,该音高调整界面包括字符显示区域,该字符显示区域用于对歌词中的字符进行显示。例如,参见图4,图4是根据一示例性实施例示出的一种音高调整界面的界面示意图,在如图4所示的音高调整界面中,区域401即为该字符显示区域。
需要说明的是,由于字符显示区域的区域大小有限,可能无法将歌词中的所有字符同时显示在字符显示区域中,因而在一些实施例中,在对字符进行显示时,先在该音高调整界面的字符显示区域,显示该歌词中的部分字符,并在该字符显示区域提供滑动功能,以便用户能够通过在字符显示区域中所显示的字符进行滑动操作,以使终端能够响应于对所显示字符的滑动操作,在该字符显示区域显示歌词中的剩余字符。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,,在作为字符显示区域的区域401中,仅显示有“我”、“把”、“月”、“亮”、“送”这五个字符,并未对歌词中的全部字符进行显示,用户能够通过对所显示的这五个字符进行滑动操作,以使终端能够在区域401中,显示出更多的字符。
由于字符显示区域可能无法完整显示歌词所包括的所有字符,因而采用先对部分字符进行显示,并在显示字符的区域设置滑动功能,用户能够通过对所显示的字符进行滑动操作,以使剩余字符显示在字符显示区域中,进而对剩余字符对应的音频片段的音高进行调整,从而实现对整个音频的处理。
在一些实施例中,该音频调整界面还包括歌词显示区域,该歌词显示区域用于对字符显示区域所显示的字符对应的歌词进行显示。
在一些实施例中,若该音频调整界面包括歌词显示区域,则终端在该歌词显示区域显示该歌词中的目标歌词,该目标歌词为该歌词中的任一句歌词;并在该音高调整界面的字符显示区域,显示该目标歌词的部分字符。此外,若用户对歌词显示区域所显示的字符进行了滑动操作,则终端响应于对所显示字符的滑动操作,在该字符显示区域显示该目标歌词的剩余字符。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,区域403即为该歌词显示区域,该歌词显示区域中所显示的歌词,为字符显示区域所显示的字符所属的歌词,也即是,字符显示区域中所显示的“我”、“把”、“月”、“亮”、“送”这五个字符,均为歌词显示区域所显示的歌词“我把月亮送给你”中的字符。而若用户对字符显示区域中所显示的字符进行了滑动操作,则终端即可响应于用户的滑动操作,对“我把月亮送给你”中剩余的“给”、“你”这两个字符进行显示;或者,终端响应于用户的滑动操作,对“我把月亮送给你”中剩余的“月”、“亮”、“送”、“给”、 “你”这五个字符进行显示,以保证字符显示区域每次所显示的字符,都满足字符显示区域所能显示字符的最大数量,从而提高显示效果。
在一些实施例中,该音高调整界面还包括第一歌词切换控件和第二歌词切换控件,该第一歌词切换控件用于切换至上一句歌词,该第二切换控件用于切换至下一句歌词。例如,在该音高调整界面的歌词显示区域中,设置有第一歌词切换控件和第二歌词切换控件。
通过在音高调整界面中提供第一歌词切换控件和第二歌词切换控件,以便用户能够通过这两个歌词切换控件,来对当前显示的歌词进行切换,从而对其他歌词进行处理,进而实现对第一音频中所有歌词的处理。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,在作为歌词显示区域的区域402中,设置有第一歌词切换控件411和第二歌词切换控件412。
在一些实施例中,若用户想要对当前所显示的目标歌词的上一句歌词对应的音频片段进行处理,则用户触发该第一切换控件,终端响应于用户对该第一切换控件的触发操作,将该音高调整界面当前显示的歌词切换为上一句歌词。例如,终端响应于用户对该第一切换控件的触发操作,将该音高调整界面的歌词显示区域当前所显示的目标歌词,切换为该目标歌词的上一句歌词。
相应地,若当前显示的歌词切换为上一句歌词,则字符显示区域中所显示的字符,也会对应切换为上一句歌词中的部分字符。
在另一些实施例中,若用户想要对当前所显示的目标歌词的下一句歌词对应的音频片段进行处理,则用户触发该第二切换控件,终端响应于用户对该第二切换控件的触发操作,将该音高调整界面当前显示的歌词切换为下一句歌词。例如,终端响应于用户对该第二切换控件的触发操作,将该音高调整界面的歌词显示区域当前所显示的目标歌词,切换为该目标歌词的下一句歌词。
相应地,若当前显示的歌词切换为下一句歌词,则字符显示区域中所显示的字符,也会对应切换为下一句歌词中的字符。
在步骤304中,终端在该字符对应的区域,基于该音频片段的音高,显示该字符的音高调整控件,该音高调整控件的显示位置与该音频片段的音高对应,该音高调整控件用于调整该字符在该第一音频中对应的音频片段的音高。
在一些实施例中,该音高调整界面包括控件显示区域,该控件显示区域用于基于所显示的字符对应的音频片段的音高,对所显示的字符的音高调整控件进行显示。其中,该控 件显示区域还划分为多个子区域,每个子区域用于对一个音频调整控件进行显示,每个小区域都对应于一个字符。例如,仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,区域403即为该控件显示区域,该控件显示区域所显示的控件,与字符显示区域所显示的字符一一对应。
通过在音高调整界面中划分不同的功能区域,以便在歌词显示区域中对歌词进行显示,在字符显示区域中对字符进行显示,在字符对应的区域中对音高调整控件进行显示,提高显示效果,方便用户使用,从而提高用户体验。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,在作为字符显示区域的区域401中,仅显示有“我”、“把”、“月”、“亮”、“送”这五个字符,则控件显示区域所显示的音高调整控件,也仅包括“我”、“把”、“月”、“亮”、“送”这五个字符对应的五个音高调整控件,也即是,字符“我”的音高调整控件413、字符“把”的音高调整控件414、字符“月”的音高调整控件415、字符“亮”的音高调整控件416以及字符“送”的音高调整控件417。
需要说明的是,若在上述步骤303中,音高调整界面的字符显示区域中所显示的是歌词中的部分字符,则相应地,该控件显示区域中所显示的音高调整控件,仅为这部分字符的音高调整控件。而若用户对字符显示区域所显示字符进行了滑动操作,使得字符显示区域所显示的字符发生了更新,则音高调整界面中所显示的音高调整控件的显示位置,也会相应变为更新后所显示字符对应的音频片段的音高对应的位置。
此外,若用户触发了音高调整界面的第一歌词切换控件或第二歌词切换控件,则该音高调整界面中音高调整控件的显示位置,也会随着歌词的切换而发生更新,音频调整控件更新后的显示位置,对应于切换到的歌词中的字符对应的音频片段的音高。
在一些实施例中,终端在该字符的音高调整控件下方,显示与该字符对应的柱形图,该柱形图的高度与该音频片段的音高对应。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,每个音高调整控件下方均显示有相应的柱形图,各个柱形图的高度均由对应字符所对应的音频片段的音高决定。
通过在音高调整控件的下方,显示高度与音频片段的音高对应的柱形图,能够丰富音频处理过程中的显示形式。
在一些实施例中,终端在第一音频的音高调整界面中该字符对应的区域中,显示第一唱名,该第一唱名对应于该音频片段的音高。其中,唱名是指在演唱旋律时为方便唱谱而 采用的名称,例如,唱名包括“do”、“re”、“mi”、“fa”、“sol”、“la”、“si”。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,位于每个音高调整控件下方的柱形图底部,均显示有各个字符对应音频片段的音高所对应的唱名。
通过在字符对应的区域中,对音频片段的音高对应的唱名进行显示,使得用户通过所显示的唱名就能直接确定出音频片段的音高,提高显示效果,从而提高用户体验。
需要说明的是,上述步骤301至步骤304的步骤标号所指示的顺序,并不构成对上述步骤301至步骤304的执行顺序的限定,在另一些实施例中,终端执行完步骤301后,执行步骤303,接着再执行步骤302,执行完步骤302后,执行步骤304,或者,采用其他顺序来执行上述步骤301至步骤304,本公开实施例对此不加以限定。
在步骤305中,终端响应于对字符的音高调整控件的调整操作,确定该音频片段的目标音高。
在一些实施例中,用户对音高调整控件进行调整操作,采用对音高调整控件进行滑动操作的方式,或者,采用在字符对应的区域中进行触发操作的方式,或者,采用更多可选的方式,本公开实施例对此不加以限定。
在一些实施例中,用户通过对字符的音高调整控件进行滑动,来对音频片段的音高进行调整,则终端响应于对该字符的音高调整控件的滑动操作,将滑动操作的目标位置对应的音高,确定为该目标音高。
在另一些实施例中,用户通过在字符对应的区域中进行触发操作,来对音频片段的音高进行调整,则终端响应于在该字符对应的区域中的触发操作,将被触发的目标位置对应的音高,确定为该目标音高。
通过将对音高调整控件的调整操作提供为滑动操作和触发操作这两种可选的操作形式,增加用户可选的操作形式,提高用户操作过程的灵活性。
在用户对音高调整控件进行调整操作时,终端响应于对该字符的音高调整控件的调整操作,显示音高调整范围,以便用户基于所显示的音高调整范围,来对音高调整控件的位置进行调整。其中,该音高调整范围为“do”、“re”、“mi”、“fa”、“sol”、“la”、“si”这7个唱名的对应的全音和半音的音高,也即是,该音高调整范围包括14个可选音高。
通过在用户对音高调整控件进行调整操作时,显示可操作的音高调整范围,以便用户基于所显示的音高调整范围进行调整操作,减少调整超范围的情况的出现。
在用户对音高调整控件进行调整操作的过程中,若用户的调整操作超出该音高调整范围,则终端响应于该调整操作超出该音高调整范围,显示提示信息,该提示信息用于提示该调整操作超出该音高调整范围。
通过在用户的调整操作超出音高调整范围时,及时通过提示信息来对用户进行提示,以便用户对调整超范围的情况及时进行处理。
其中,该音高调整范围包括最小音高和最大音高。在一些实施例中,若该调整操作的目标位置对应的音高小于该最小音高,则将该最小音高确定为该目标音高。在另一些实施例中,若该调整操作的目标位置对应的音高大于该最大音高,则将该最大音高确定为该目标音高。
通过在用户的调整操作超出音高调整范围时,根据调整操作的情况,直接将最小音高或最大音高确定为目标音高,使得用户能够在最小音高或最大音高的基础上继续进行调整,无需从最原始的音高开始重新进行调整,减少用户的操作成本,提高音频处理过程的效率。
在一些实施例中,每次用户对音高调整控件进行调整操作时,终端均能够响应于对该字符的音高调整控件进行调整操作,基于调整操作的目标位置对应的音高,播放该音频片段。也即是,每次用户对音高调整控件进行调整操作后,终端均能够基于调整操作的目标位置对应的音高,来自动播放音频片段。
其中,播放音频片段的时长为0.3秒(s),或者,播放音频片段的时长为其他取值,本公开实施例对此不加以限定。
通过在调整操作结束时,基于所调整到的目标位置对应的音高,来对音频片段进行播放,使得用户能够及时了解音频片段的效果,以便用户对音频片段进行进一步处理。
此外,在用户对字符的音高调整控件进行调整操作的过程中,音频片段对应的音高的唱名可能发生变化,为便于说明,将调整操作的目标位置对应音高所对应的唱名记为第二唱名,若字符对应的区域当前显示的第一唱名与第二唱名不同,则终端将该字符对应的区域中所显示的第一唱名更新为该第二唱名。
在一些实施例中,该音高调整界面还包括播放控件,用户能够通过触发该播放控件,来对该音高调整界面当前所显示的歌词对应的第一音频片段进行试听。终端响应于对该播放控件的触发操作,基于第一音频片段的目标音高,播放该第一音频片段,该第一音频片段对应于该音高调整界面当前所显示的歌词。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中, 该播放控件被提供为播放按钮418,用户能够通过触发该播放按钮418,来对调整后的第一音频片段进行播放。
通过在音高调整界面中提供播放控件,以便用户能够通过该播放控件,基于当前所显示的歌词对应的第一音频片段的音高,来对第一音频片段进行播放,使得用户能够每句歌词调整后的音频片段进行预览,进而基于预览结果进行后续处理。
其中,在播放该第一音频片段时,终端突出显示当前播放到的字符;或者,在播放该第一音频片段时,终端突出显示当前播放到的字符的音高调整控件;或者,在播放该第一音频片段时,终端突出显示当前播放到的字符,以及当前播放到的字符的音高调整控件,本公开实施例对具体采用哪种方式不加以限定。
通过在播放第一音频片段的过程中,对当前播放到的字符或当前播放到的字符的音高调整控件进行突出显示,或者,对二者同时进行突出显示,以便用户能够清楚地知道当前播放到了哪个字符,进而基于所播放的音频的效果进行进一步处理。
在步骤306中,终端基于调整后的音频片段,生成第二音频。
在一些实施例中,该音高调整界面还包括保存控件,用户能够通过触发该保存控件,来对歌词中各个字符对应音频片段对应的音高进行存储,进而基于已存储的音高,对第一音频的曲谱文件进行更新,得到更新后的曲谱文件,从而基于更新后的曲谱文件生成第二音频,该第二音频已经是对音频片段的音高进行调整后得到的音频。
仍以图4所示的音高调整界面的界面示意图为例,在如图4所示的音高调整界面中,该保存控件被提供为保存按钮419。
本公开实施例提供的方案,通过提供音高调整界面,能够在直观展示歌词的同时,对字符对应的音频片段的音高进行调整,无需专业人员看谱调整,学习成本较低,方便快捷,提高了音频处理的效率。通过本公开实施例提供的方案,用户在终端上就能够对音频片段的音高来进行调整,使得音频处理变得更加便捷,并且,还能为用户提供调整效果实时试听的功能,从而提高用户体验。此外,用户在对音频片段的音高进行调整时,能够对每个字符对应的音频片段的音高进行调整,提高音频处理过程的灵活性,从而使得调整后的音频,能够满足用户个性化的音乐审美。
图5是根据一示例性实施例示出的一种音频处理装置的框图。参照图5,该装置包括:
显示单元501,被配置为在第一音频的音高调整界面中,显示该第一音频的歌词中的字符和该字符的音高调整控件,该音高调整控件用于调整该字符在该第一音频中对应的音 频片段的音高;
确定单元502,被配置为响应于对该字符的音高调整控件的调整操作,确定该音频片段的目标音高;
生成单元503,被配置为基于调整后的音频片段,生成第二音频。
本公开实施例提供的装置,通过提供音高调整界面,能够在直观展示歌词的同时,对字符对应的音频片段的音高来进行调整,无需专业人员看谱调整,学习成本较低,方便快捷,提高了音频处理的效率。
在一些实施例中,该显示单元501,被配置为在该音高调整界面的字符显示区域,显示该歌词中的字符;
该显示单元501,还被配置为在该字符对应的区域,基于该音频片段的音高,显示该字符的音高调整控件。
在一些实施例中,该音高调整控件的显示位置与该音频片段的音高对应。
在一些实施例中,该显示单元501,还被配置为在该字符的音高调整控件下方,显示与该字符对应的柱形图,该柱形图的高度与该音频片段的音高对应。
在一些实施例中,该装置还包括:
获取单元,被配置为获取该第一音频的歌词,该歌词包括字符以及字符对应的时间标签;
该获取单元,还被配置为从该第一音频的曲谱文件中,获取与该字符对应的时间标签对应的音高。
在一些实施例中,该显示单元501,被配置为在该音高调整界面的字符显示区域,显示该歌词中的部分字符;响应于对所显示字符的滑动操作,在该字符显示区域显示该歌词中的剩余字符。
在一些实施例中,该确定单元502,被配置为响应于对该字符的音高调整控件的滑动操作,将滑动操作的目标位置对应的音高,确定为该目标音高;
该确定单元502,还被配置为响应于在该字符对应区域中的触发操作,将被触发的目标位置对应的音高,确定为该目标音高。
在一些实施例中,该装置还包括:
第一播放单元,被配置为响应于对该字符的音高调整控件进行调整操作,基于调整操作的目标位置对应的音高,播放该音频片段。
在一些实施例中,该显示单元501,还被配置为在第一音频的音高调整界面中该字符 对应的区域中,显示第一唱名,该第一唱名对应于该音频片段的音高;
该装置还包括:
更新单元,被配置为若该第一唱名与第二唱名不同,则将该字符对应的区域中所显示的第一唱名更新为该第二唱名,该第二唱名对应于调整操作的目标位置对应的音高。
在一些实施例中,该显示单元501,还被配置为响应于对该字符的音高调整控件的调整操作,显示音高调整范围。
在一些实施例中,该显示单元501,还被配置为响应于该调整操作超出该音高调整范围,显示提示信息,该提示信息用于提示该调整操作超出该音高调整范围。
在一些实施例中,该音高调整范围包括最小音高和最大音高;
该确定单元502,还被配置为若该调整操作的目标位置对应的音高小于该最小音高,则将该最小音高确定为该目标音高;
该确定单元502,还被配置为若该调整操作的目标位置对应的音高大于该最大音高,则将该最大音高确定为该目标音高。
在一些实施例中,该音高调整界面还包括第一歌词切换控件和第二歌词切换控件;
该装置还包括:
切换单元,被配置为响应于对该第一歌词切换控件的触发操作,将该音高调整界面当前显示的歌词切换为上一句歌词;
该切换单元,还被配置为响应于对该第二歌词切换控件的触发操作,将该音高调整界面当前显示的歌词切换为下一句歌词。
在一些实施例中,该音高调整界面还包括播放控件;
该装置还包括:
第二播放单元,被配置为响应于对该播放控件的触发操作,基于第一音频片段的目标音高,播放该第一音频片段,该第一音频片段对应于该音高调整界面当前所显示的歌词。
在一些实施例中,该显示单元501,还被配置为在播放该第一音频片段时,突出显示当前播放到的字符;和/或,
该显示单元501,还被配置为在播放该第一音频片段时,突出显示当前播放到的字符的音高调整控件。
需要说明的是:上述实施例提供的音频处理装置在对音频进行处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分 功能。另外,上述实施例提供的音频处理装置与音频处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图6示出了本公开一个示例性实施例提供的终端600的结构框图。该终端600可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端600还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
终端600包括有:处理器601和存储器602。
处理器601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器601也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器601可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器602还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一个程序代码,该至少一个程序代码用于被处理器601所执行以实现本公开中方法实施例提供的音频处理方法。
在一些实施例中,终端600还可选包括有:外围设备接口603和至少一个外围设备。处理器601、存储器602和外围设备接口603之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口603相连。具体地,外围设备包括:射频电路604、显示屏605、摄像头组件606、音频电路607、定位组件608和电源609中的至少一种。
外围设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围 设备连接到处理器601和存储器602。在一些实施例中,处理器601、存储器602和外围设备接口603被集成在同一芯片或电路板上;在一些其他实施例中,处理器601、存储器602和外围设备接口603中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路604用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路604通过电磁信号与通信网络以及其他通信设备进行通信。射频电路604将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。在一些实施例中,射频电路604包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路604可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路604还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本公开对此不加以限定。
显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏605是触摸显示屏时,显示屏605还具有采集在显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。此时,显示屏605还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏605可以为一个,设置在终端600的前面板;在另一些实施例中,显示屏605可以为至少两个,分别设置在终端600的不同表面或呈折叠设计;在另一些实施例中,显示屏605可以是柔性显示屏,设置在终端600的弯曲表面上或折叠面上。甚至,显示屏605还可以设置成非矩形的不规则图形,也即异形屏。显示屏605可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件606用于采集图像或视频。在一些实施例中,摄像头组件606包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件606还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的 组合,可以用于不同色温下的光线补偿。
音频电路607可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路607还可以包括耳机插孔。
定位组件608用于定位终端600的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件608可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源609用于为终端600中的各个组件进行供电。电源609可以是交流电、直流电、一次性电池或可充电电池。当电源609包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端600还包括有一个或多个传感器610。该一个或多个传感器610包括但不限于:加速度传感器611、陀螺仪传感器612、压力传感器613、指纹传感器614、光学传感器615以及接近传感器616。
加速度传感器611可以检测以终端600建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器611可以用于检测重力加速度在三个坐标轴上的分量。处理器601可以根据加速度传感器611采集的重力加速度信号,控制显示屏605以横向视图或纵向视图进行用户界面的显示。加速度传感器611还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器612可以检测终端600的机体方向及转动角度,陀螺仪传感器612可以与加速度传感器611协同采集用户对终端600的3D动作。处理器601根据陀螺仪传感器612采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器613可以设置在终端600的侧边框和/或显示屏605的下层。当压力传感器613设置在终端600的侧边框时,可以检测用户对终端600的握持信号,由处理器601根据压力传感器613采集的握持信号进行左右手识别或快捷操作。当压力传感器613设置 在显示屏605的下层时,由处理器601根据用户对显示屏605的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器614用于采集用户的指纹,由处理器601根据指纹传感器614采集到的指纹识别用户的身份,或者,由指纹传感器614根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器601授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器614可以被设置在终端600的正面、背面或侧面。当终端600上设置有物理按键或厂商Logo时,指纹传感器614可以与物理按键或厂商Logo集成在一起。
光学传感器615用于采集环境光强度。在一个实施例中,处理器601可以根据光学传感器615采集的环境光强度,控制显示屏605的显示亮度。具体地,当环境光强度较高时,调高显示屏605的显示亮度;当环境光强度较低时,调低显示屏605的显示亮度。在另一个实施例中,处理器601还可以根据光学传感器615采集的环境光强度,动态调整摄像头组件606的拍摄参数。
接近传感器616,也称距离传感器,通常设置在终端600的前面板。接近传感器616用于采集用户与终端600的正面之间的距离。在一个实施例中,当接近传感器616检测到用户与终端600的正面之间的距离逐渐变小时,由处理器601控制显示屏605从亮屏状态切换为息屏状态;当接近传感器616检测到用户与终端600的正面之间的距离逐渐变大时,由处理器601控制显示屏605从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图6中示出的结构并不构成对终端600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种终端,包括:一个或多个处理器;用于存储该一个或多个处理器可执行程序代码的存储器;其中,该一个或多个处理器被配置为执行该程序代码,以实现如下步骤:
在第一音频的音高调整界面中,显示第一音频的歌词中的字符和字符的音高调整控件,音高调整控件用于调整字符在第一音频中对应的音频片段的音高;
响应于对字符的音高调整控件的调整操作,确定音频片段的目标音高;
基于调整后的音频片段,生成第二音频。
在一些实施例中,该一个或多个处理器还被配置为执行程序代码,以实现上述方法实施例中的其他实施例提供的音频处理方法中的步骤。
在示例性实施例中,还提供了一种包括程序代码的计算机可读存储介质,例如包括程序代码的存储器602,上述程序代码可由终端600的处理器601执行以完成如下步骤:
在第一音频的音高调整界面中,显示第一音频的歌词中的字符和字符的音高调整控件,音高调整控件用于调整字符在第一音频中对应的音频片段的音高;
响应于对字符的音高调整控件的调整操作,确定音频片段的目标音高;
基于调整后的音频片段,生成第二音频。
在一些实施例中,上述程序代码可由终端600的处理器601执行以完成上述方法实施例中的其他实施例提供的音频处理方法中的步骤。
在一些实施例中,计算机可读存储介质是ROM(Read-Only Memory,只读内存)、RAM(Random Access Memory,随机存取存储器)、CD-ROM(Compact-Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品,包括计算机程序,该计算机程序被终端600的处理器601执行时实现如下步骤:
在第一音频的音高调整界面中,显示第一音频的歌词中的字符和字符的音高调整控件,音高调整控件用于调整字符在第一音频中对应的音频片段的音高;
响应于对字符的音高调整控件的调整操作,确定音频片段的目标音高;
基于调整后的音频片段,生成第二音频。
在一些实施例中,该计算机程序被终端600的处理器601执行时实现上述方法实施例中的其他实施例提供的音频处理方法中的步骤。
在一些实施例中,本申请实施例所涉及的计算机程序可被部署在一个终端上执行,或者在位于一个地点的多个终端上执行,又或者,在分布在多个地点且通过通信网络互连的多个终端上执行,分布在多个地点且通过通信网络互连的多个终端可以组成区块链系统。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (33)

  1. 一种音频处理方法,包括:
    在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音高;
    响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高;
    基于调整后的所述音频片段,生成第二音频。
  2. 根据权利要求1所述的音频处理方法,其中,所述在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,包括:
    在所述音高调整界面的字符显示区域,显示所述歌词中的字符;
    在所述字符对应的区域,基于所述音频片段的音高,显示所述字符的音高调整控件。
  3. 根据权利要求2所述的音频处理方法,其中,所述音高调整控件的显示位置与所述音频片段的音高对应。
  4. 根据权利要求1所述的音频处理方法,其中,所述方法还包括:
    在所述字符的音高调整控件下方,显示所述字符对应的柱形图,所述柱形图的高度与所述音频片段的音高对应。
  5. 根据权利要求2所述的音频处理方法,其中,所述方法还包括:
    获取所述第一音频的歌词,所述歌词包括所述字符以及所述字符对应的时间标签;
    从所述第一音频的曲谱文件中,获取所述字符对应的时间标签对应的音高。
  6. 根据权利要求2所述的音频处理方法,其中,所述在所述音高调整界面的字符显示区域,显示所述歌词中的字符,包括:
    在所述音高调整界面的字符显示区域,显示所述歌词中的部分字符;
    响应于对所显示字符的滑动操作,在所述字符显示区域显示所述歌词中的剩余字符。
  7. 根据权利要求1所述的音频处理方法,其中,所述响应于对字符的音高调整控件 的调整操作,确定所述音频片段的目标音高,包括:
    响应于对所述字符的音高调整控件的滑动操作,将所述滑动操作的目标位置对应的音高,确定为所述目标音高;
    或者,
    响应于在所述字符对应的区域中的触发操作,将被触发的目标位置对应的音高,确定为所述目标音高。
  8. 根据权利要求1所述的音频处理方法,其中,所述方法还包括:
    响应于对所述字符的音高调整控件的调整操作,基于所述调整操作的目标位置对应的音高,播放所述音频片段。
  9. 根据权利要求1所述的音频处理方法,其中,所述方法还包括:
    在所述第一音频的音高调整界面中所述字符对应的区域中,显示第一唱名,所述第一唱名对应于所述音频片段的音高;
    在所述第一唱名与第二唱名不同的情况下,将所述字符对应的区域中所显示的第一唱名更新为所述第二唱名,所述第二唱名对应于所述调整操作的目标位置对应的音高。
  10. 根据权利要求1所述的音频处理方法,其中,所述方法还包括:
    响应于对所述字符的音高调整控件的调整操作,显示音高调整范围。
  11. 根据权利要求10所述的音频处理方法,其中,所述方法还包括:
    响应于所述调整操作超出所述音高调整范围,显示提示信息,所述提示信息用于提示所述调整操作超出所述音高调整范围。
  12. 根据权利要求10所述的音频处理方法,其中,所述音高调整范围包括最小音高和最大音高;
    所述方法还包括:
    响应于所述调整操作的目标位置对应的音高小于所述最小音高,将所述最小音高确定为所述目标音高;
    响应于所述调整操作的目标位置对应的音高大于所述最大音高,将所述最大音高确定为所述目标音高。
  13. 根据权利要求1所述的音频处理方法,其中,所述音高调整界面还包括第一歌词切换控件和第二歌词切换控件;
    所述方法还包括:
    响应于对所述第一歌词切换控件的触发操作,将所述音高调整界面当前显示的歌词切换为上一句歌词;
    响应于对所述第二歌词切换控件的触发操作,将所述音高调整界面当前显示的歌词切换为下一句歌词。
  14. 根据权利要求1-13任一项所述的音频处理方法,其中,所述音高调整界面还包括播放控件;
    所述方法还包括:
    响应于对所述播放控件的触发操作,基于第一音频片段的目标音高,播放所述第一音频片段,所述第一音频片段对应于所述音高调整界面当前所显示的歌词。
  15. 根据权利要求14所述的音频处理方法,其中,所述方法还包括:
    在播放所述第一音频片段时,突出显示当前播放到的字符;
    和/或,
    在播放所述第一音频片段时,突出显示当前播放到的字符的音高调整控件。
  16. 一种音频处理装置,包括:
    显示单元,被配置为在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音高;
    确定单元,被配置为响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高;
    生成单元,被配置为基于调整后的所述音频片段,生成第二音频。
  17. 根据权利要求16所述的音频处理装置,其中,所述显示单元,被配置为在所述音高调整界面的字符显示区域,显示所述歌词中的字符;
    所述显示单元,还被配置为在所述字符对应的区域,基于所述音频片段的音高,显示 所述字符的音高调整控件。
  18. 根据权利要求17所述的音频处理装置,其中,所述音高调整控件的显示位置与所述音频片段的音高对应。
  19. 根据权利要求16所述的音频处理装置,其中,所述显示单元,还被配置为在所述字符的音高调整控件下方,显示与所述字符对应的柱形图,所述柱形图的高度与所述音频片段的音高对应。
  20. 根据权利要求17所述的音频处理装置,其中,所述装置还包括:
    获取单元,被配置为获取所述第一音频的歌词,所述歌词包括所述字符以及所述字符对应的时间标签;
    所述获取单元,还被配置为从所述第一音频的曲谱文件中,获取与所述字符对应的时间标签对应的音高。
  21. 根据权利要求17所述的音频处理装置,其中,所述显示单元,被配置为在所述音高调整界面的字符显示区域,显示所述歌词中的部分字符;响应于对所显示字符的滑动操作,在所述字符显示区域显示所述歌词中的剩余字符。
  22. 根据权利要求16所述的音频处理装置,其中,所述确定单元,被配置为响应于对所述字符的音高调整控件的滑动操作,将所述滑动操作的目标位置对应的音高,确定为所述目标音高;
    所述确定单元,还被配置为响应于在所述字符对应的区域中的触发操作,将被触发的目标位置对应的音高,确定为所述目标音高。
  23. 根据权利要求16所述的音频处理装置,其中,所述装置还包括:
    第一播放单元,被配置为响应于对所述字符的音高调整控件进行调整操作,基于所述调整操作的目标位置对应的音高,播放所述音频片段。
  24. 根据权利要求16所述的音频处理装置,其中,所述显示单元,还被配置为在第一音频的音高调整界面中所述字符对应的区域中,显示第一唱名,所述第一唱名对应于所 述音频片段的音高;
    所述装置还包括:
    更新单元,被配置为在所述第一唱名与第二唱名不同的情况下,将所述字符对应的区域中所显示的第一唱名更新为所述第二唱名,所述第二唱名对应于所述调整操作的目标位置对应的音高。
  25. 根据权利要求16所述的音频处理装置,其中,所述显示单元,还被配置为响应于对所述字符的音高调整控件的调整操作,显示音高调整范围。
  26. 根据权利要求25所述的音频处理装置,其中,所述显示单元,还被配置为响应于所述调整操作超出所述音高调整范围,显示提示信息,所述提示信息用于提示所述调整操作超出所述音高调整范围。
  27. 根据权利要求25所述的音频处理装置,其中,所述音高调整范围包括最小音高和最大音高;
    所述确定单元,还被配置为响应于所述调整操作的目标位置对应的音高小于所述最小音高,将所述最小音高确定为所述目标音高;
    所述确定单元,还被配置为响应于所述调整操作的目标位置对应的音高大于所述最大音高,将所述最大音高确定为所述目标音高。
  28. 根据权利要求16所述的音频处理装置,其中,所述音高调整界面还包括第一歌词切换控件和第二歌词切换控件;
    所述装置还包括:
    切换单元,被配置为响应于对所述第一歌词切换控件的触发操作,将所述音高调整界面当前显示的歌词切换为上一句歌词;
    所述切换单元,还被配置为响应于对所述第二歌词切换控件的触发操作,将所述音高调整界面当前显示的歌词切换为下一句歌词。
  29. 根据权利要求16-28任一项所述的音频处理装置,其中,所述音高调整界面还包括播放控件;
    所述装置还包括:
    第二播放单元,被配置为响应于对所述播放控件的触发操作,基于第一音频片段的目标音高,播放所述第一音频片段,所述第一音频片段对应于所述音高调整界面当前所显示的歌词。
  30. 根据权利要求29所述的音频处理装置,其中,所述显示单元,还被配置为在播放所述第一音频片段时,突出显示当前播放到的字符;和/或,
    所述显示单元,还被配置为在播放所述第一音频片段时,突出显示当前播放到的字符的音高调整控件。
  31. 一种终端,包括:
    一个或多个处理器;
    用于存储所述一个或多个处理器可执行程序代码的存储器;
    其中,所述一个或多个处理器被配置为执行所述程序代码,以实现如下步骤:
    在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音高;
    响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高;
    基于调整后的所述音频片段,生成第二音频。
  32. 一种计算机可读存储介质,当所述计算机可读存储介质中的程序代码由终端的处理器执行时,使得终端能够执行如下步骤:
    在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音高;
    响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高;
    基于调整后的所述音频片段,生成第二音频。
  33. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如下步骤:
    在第一音频的音高调整界面中,显示所述第一音频的歌词中的字符和所述字符的音高调整控件,所述音高调整控件用于调整所述字符在所述第一音频中对应的音频片段的音 高;
    响应于对所述字符的音高调整控件的调整操作,确定所述音频片段的目标音高;
    基于调整后的所述音频片段,生成第二音频。
PCT/CN2021/136890 2021-04-28 2021-12-09 音频处理方法及装置 WO2022227589A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110470416.X 2021-04-28
CN202110470416.XA CN113204673A (zh) 2021-04-28 2021-04-28 音频处理方法、装置、终端及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022227589A1 true WO2022227589A1 (zh) 2022-11-03

Family

ID=77029463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136890 WO2022227589A1 (zh) 2021-04-28 2021-12-09 音频处理方法及装置

Country Status (2)

Country Link
CN (1) CN113204673A (zh)
WO (1) WO2022227589A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204673A (zh) * 2021-04-28 2021-08-03 北京达佳互联信息技术有限公司 音频处理方法、装置、终端及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839559A (zh) * 2012-11-20 2014-06-04 华为技术有限公司 音频文件制作方法及终端设备
US20180349495A1 (en) * 2016-05-04 2018-12-06 Tencent Technology (Shenzhen) Company Limited Audio data processing method and apparatus, and computer storage medium
CN109584910A (zh) * 2017-09-29 2019-04-05 雅马哈株式会社 歌唱音频的编辑辅助方法以及歌唱音频的编辑辅助装置
CN111026907A (zh) * 2019-12-09 2020-04-17 腾讯音乐娱乐科技(深圳)有限公司 音频播放过程中音频信息的显示方法及装置
CN113204673A (zh) * 2021-04-28 2021-08-03 北京达佳互联信息技术有限公司 音频处理方法、装置、终端及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1326228B1 (en) * 2002-01-04 2016-03-23 MediaLab Solutions LLC Systems and methods for creating, modifying, interacting with and playing musical compositions
JP6179221B2 (ja) * 2013-06-28 2017-08-16 ヤマハ株式会社 音響処理装置および音響処理方法
CN110600034B (zh) * 2019-09-12 2021-12-03 广州酷狗计算机科技有限公司 歌声生成方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839559A (zh) * 2012-11-20 2014-06-04 华为技术有限公司 音频文件制作方法及终端设备
US20180349495A1 (en) * 2016-05-04 2018-12-06 Tencent Technology (Shenzhen) Company Limited Audio data processing method and apparatus, and computer storage medium
CN109584910A (zh) * 2017-09-29 2019-04-05 雅马哈株式会社 歌唱音频的编辑辅助方法以及歌唱音频的编辑辅助装置
CN111026907A (zh) * 2019-12-09 2020-04-17 腾讯音乐娱乐科技(深圳)有限公司 音频播放过程中音频信息的显示方法及装置
CN113204673A (zh) * 2021-04-28 2021-08-03 北京达佳互联信息技术有限公司 音频处理方法、装置、终端及计算机可读存储介质

Also Published As

Publication number Publication date
CN113204673A (zh) 2021-08-03

Similar Documents

Publication Publication Date Title
WO2021008055A1 (zh) 视频合成的方法、装置、终端及存储介质
CN109033335B (zh) 音频录制方法、装置、终端及存储介质
CN108538302B (zh) 合成音频的方法和装置
CN110491358B (zh) 进行音频录制的方法、装置、设备、系统及存储介质
CN109327608B (zh) 歌曲分享的方法、终端、服务器和系统
CN109144346B (zh) 歌曲分享方法、装置及存储介质
CN109346111B (zh) 数据处理方法、装置、终端及存储介质
WO2021068903A1 (zh) 确定音量的调节比例信息的方法、装置、设备及存储介质
CN110139143B (zh) 虚拟物品显示方法、装置、计算机设备以及存储介质
CN111061405B (zh) 录制歌曲音频的方法、装置、设备及存储介质
CN111753125A (zh) 歌曲音频显示的方法和装置
CN109743461B (zh) 音频数据处理方法、装置、终端及存储介质
WO2019242235A1 (zh) 混音方法、装置及存储介质
CN111142838A (zh) 音频播放方法、装置、计算机设备及存储介质
CN111711838B (zh) 视频切换方法、装置、终端、服务器及存储介质
CN111402844B (zh) 歌曲合唱的方法、装置及系统
CN110245255B (zh) 歌曲显示方法、装置、设备及存储介质
WO2022227581A1 (zh) 资源展示方法及计算机设备
CN111092991A (zh) 歌词显示方法及装置、计算机存储介质
CN113963707A (zh) 音频处理方法、装置、设备和存储介质
WO2022227589A1 (zh) 音频处理方法及装置
CN111399796B (zh) 语音消息聚合方法、装置、电子设备及存储介质
CN111081277B (zh) 音频测评的方法、装置、设备及存储介质
CN112118482A (zh) 音频文件的播放方法、装置、终端及存储介质
CN111818358A (zh) 音频文件的播放方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE