WO2023216999A1 - 音频处理方法、装置、设备及存储介质 - Google Patents

音频处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023216999A1
WO2023216999A1 PCT/CN2023/092363 CN2023092363W WO2023216999A1 WO 2023216999 A1 WO2023216999 A1 WO 2023216999A1 CN 2023092363 W CN2023092363 W CN 2023092363W WO 2023216999 A1 WO2023216999 A1 WO 2023216999A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
target
control
processed
interface
Prior art date
Application number
PCT/CN2023/092363
Other languages
English (en)
French (fr)
Inventor
林豪
刘文武
黄昊
黄锦
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023216999A1 publication Critical patent/WO2023216999A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the present application relates to the field of information processing technology, and in particular to an audio processing method, device, equipment and storage medium.
  • Audio editing is a typical way for users to edit media content to create stylized media content.
  • Embodiments of the present application provide an audio processing method, device, equipment and storage medium for improving the diversification of audio editing functions to meet the personalized needs of users.
  • an embodiment of the present disclosure provides an audio processing method, including:
  • an audio processing device including:
  • the acquisition module is used to acquire the audio to be processed in response to the audio acquisition instruction
  • a processing module configured to perform audio separation on the audio to be processed in response to an audio separation instruction for the audio to be processed, to obtain target audio, where the target audio is a person separated from the audio to be processed. vocals and/or accompaniment;
  • a rendering module used to render the target audio.
  • embodiments of the present disclosure provide an electronic device, including: a processor and a memory;
  • the memory stores computer execution instructions
  • the processor executes the computer execution instructions stored in the memory, so that the at least one processor executes the audio processing method described in the above first aspect and various possible designs of the first aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • the processor executes the computer-executable instructions, the above first aspect and the first aspect are implemented. aspects of various possible designs for the described audio processing method.
  • embodiments of the present disclosure provide a computer program product, including a computer program that, when executed by a processor, implements the audio processing method described in the first aspect and various possible designs of the first aspect.
  • embodiments of the present disclosure provide a computer program that, when executed by a processor, implements the audio processing method described in the first aspect and various possible designs of the first aspect.
  • Figure 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flow chart of another audio processing method provided by an embodiment of the present disclosure.
  • Figure 3 is a schematic diagram of an audio processing interface provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic interface diagram of another audio processing provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic interface diagram of yet another audio processing provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic diagram of another audio processing interface provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic flowchart of yet another audio processing method provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic flowchart of yet another audio processing method provided by an embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of the implementation principle of accompaniment separation provided by an embodiment of the present disclosure.
  • Figure 10 is a schematic diagram of the implementation principle of audio file saving provided by an embodiment of the present disclosure.
  • Figure 11 is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure.
  • Figure 12 is a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an audio processing method which can not only separate audio, but also separate human voices and / or accompaniment, and can also present the separated vocals and / or accompaniment to the user for listening, saving, sharing or post-processing, which can meet the diverse needs of users and improve the user experience.
  • the technical solutions provided by the embodiments of the present disclosure can be applied to scenarios where electronic devices process audio.
  • the electronic device can be any device with audio processing functions, it can be a terminal device, a server or a virtual machine, etc., or it can be a distributed computer system composed of one or more servers and/or computers, etc.
  • terminal devices include but are not limited to smartphones, notebook computers, desktop computers, platform computers, vehicle-mounted devices, smart wearable devices, smart screens, etc., which are not limited in the embodiments of the present disclosure.
  • the server can be an ordinary server or a cloud server.
  • the cloud server is also called a cloud computing server or a cloud host. It is a host product in the cloud computing service system.
  • the server can also be a distributed system server or a server combined with a blockchain.
  • the product implementation form of the present disclosure is program code included in platform software and deployed on electronic devices (which may also be hardware with computing capabilities such as computing clouds or mobile terminals).
  • the program code of the present disclosure may be stored inside an electronic device.
  • the program code runs in the electronic device's host memory and/or GPU memory.
  • plality refers to two or more than two.
  • “And/or” describes the relationship between related objects, indicating that there can be three relationships.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character “/” generally indicates that the related objects are in an "or” relationship.
  • Embodiments of the present disclosure provide an audio processing method, device, equipment and storage medium.
  • the audio to be processed is obtained, and in response to the audio separation instruction for the audio to be processed, audio separation is performed on the audio to be processed to obtain Target audio, where the target audio is a vocal and/or accompaniment separated from the audio to be processed; presenting the target Audio can present the directly separated human voice and/or accompaniment to the user for playback, saving, sharing or processing, etc., which can meet the diverse needs of users and improve the user experience.
  • FIG. 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. This method is explained using the electronic device in Figure 1 as the execution subject. As shown in Figure 1, the audio processing method may include the following steps:
  • an audio acquisition instruction may be issued to the electronic device, so that the electronic device acquires the audio to be processed in response to the obtained audio acquisition instruction.
  • the audio acquisition instruction may be issued by the user through the human-computer interaction interface of the electronic device, for example, by touching controls on the human-computer interaction interface, or may be issued by voice (in this case, the electronic device has Controls for functions such as voice acquisition or playback), which are not limited here.
  • the electronic device in response to the detected or received audio acquisition instruction, can receive the audio to be processed from other devices, or can read the audio to be processed from its own stored database (at this time, a database is deployed in the electronic device) , and can also obtain pending audio from the cloud.
  • the embodiment of the present disclosure does not limit the acquisition method of the audio to be processed, which can be determined according to the actual scenario, and will not be described again here.
  • the audio to be processed obtained by the electronic device may be pre-processed audio, for example, it may be audio data obtained by extracting audio from the obtained target video by the electronic device, or it may be It is unprocessed audio and is not limited in this embodiment.
  • the user can issue an audio separation instruction to the electronic device, so that the electronic device responds to the audio separation instruction, performs audio separation on the audio to be processed, and separates the target from the audio to be processed.
  • audio and then obtain the vocal and/or accompaniment separated from the audio to be processed, that is, the target audio may be at least one of the vocal and the accompaniment.
  • the electronic device may obtain the audio separation instruction issued by the user through the human-computer interaction interface, or may obtain the audio separation instruction issued by the user through voice, which is not limited in this embodiment.
  • the electronic device after the electronic device separates the target audio from the audio to be processed, the electronic device can present the target audio for the user to play, save, share and/or process.
  • the electronic device can present the target audio on the interface of the target application, and controls that can be operated by the user are deployed on the interface, such as saving controls, playback controls, processing controls, etc.
  • the processing control is used to trigger the rendering of the target audio on the processing page.
  • the processing page can be a page used to perform audio processing. On this page, the user can perform various audio editing and/or processing, and output the final processing results.
  • the audio processing method provided by the embodiment of the present disclosure obtains the audio to be processed in response to the audio acquisition instruction, and performs audio separation on the audio to be processed in response to the audio separation instruction for the audio to be processed to obtain the target audio, wherein the target audio
  • the target audio is finally rendered for the vocals and/or accompaniment separated from the audio to be processed.
  • FIG. 2 is a schematic flowchart of another audio processing method provided by an embodiment of the present disclosure.
  • the audio processing method may include the following steps:
  • the audio to be processed is the audio obtained by the electronic device in response to the user's touch operation on the first interface. That is, in this embodiment, the first interface is an audio uploading interface.
  • FIG. 3 is a schematic diagram of an audio processing interface provided by an embodiment of the present disclosure.
  • the first interface 31 is an upload interface for accompaniment separation
  • a first control 311 is deployed on the first interface 31
  • the first control is used to trigger loading of audio. Therefore, in this embodiment, when the user touches the first control 311 on the first interface 31, the electronic device will detect the touch operation and respond to the touch operation on the first control 311.
  • the audio to be processed is obtained from the local album and presented on the second interface 32, as shown in (b) of Figure 3 .
  • the touch operation can also be interpreted as a press operation, a touch operation or a click operation, etc.
  • the press operation can be a long press, a short press or a continuous press, etc. This embodiment does not limit the specific meaning of the touch operation.
  • the first area 321 on the second interface 31 not only contains the audio to be processed, but also a playback control 322 for triggering the playback of the audio to be processed. , also includes a detach option located below the pending audio.
  • the separation options may include a remove vocal control and a remove accompaniment control
  • the remove vocal control is used to trigger the removal of the vocal in the audio
  • the remove accompaniment control is used to trigger the removal of the accompaniment in the audio.
  • the separation option may also include an accompaniment separation control (not shown).
  • the accompaniment separation control may be used to trigger the separation of various types of audio such as vocals and accompaniments in the audio. Come out and get the vocals, accompaniment, etc. in the audio. This embodiment does not limit it.
  • the electronic device after the electronic device obtains the audio to be processed, it can perform a separation operation on the audio to be processed to obtain the target audio.
  • a second control 323 for triggering separated audio is further included below the first area 321 .
  • the electronic device detects that the user selects the separation option of removing the vocal control, if it detects the user's touch operation on the second control 323, it responds to the touch operation on the second control 323. , perform audio separation on the audio to be processed to obtain the accompaniment after removing the human voice, as shown in (c) of Figure 3.
  • the first interface, the second interface and subsequent interfaces represent different interfaces, and there is no order of priority.
  • the first control, the second control and subsequent controls only represent different controls, and there is no sequence.
  • the second control can be the first control on the second interface, etc.
  • the above S103 can be implemented by the following S203:
  • the electronic device after acquiring the target audio, can display the audio graphics corresponding to the target audio and/or the audio graphics associated with the target audio on the third interface.
  • the third interface 33 is an updated interface of the second interface 32 , and the first area 330 of the third interface 33 may include the audio to be processed before the separation process and the separation process. target audio afterwards.
  • the audio graph 332 may be a waveform amplitude envelope graph of the target audio.
  • the electronic device when the user touches the third control 331, the electronic device responds to the touch operation on the third control 331, The target audio may be played, and an audio graphic 332 that changes with the waveform amplitude of the target audio may be presented.
  • S204 Display a fourth control associated with the target audio on the third interface.
  • the fourth control is used to trigger exporting data associated with the target audio to the target location.
  • the target location includes photo album or file system.
  • the electronic device may present the target audio to the user by displaying a fourth control associated with the target audio on the third interface.
  • the fourth control 333 may be an export control, which is used to trigger the export of data related to the target audio to a target location such as a photo album or a file system.
  • the electronic device can export the target audio to the target location in response to the touch operation on the fourth control 333.
  • the electronic device when the electronic device exports the target audio, it can export it to the target location in audio format, or it can export it to the target location in file format, which is not limited in this embodiment.
  • S205 Display a fifth control associated with the target audio on the third interface.
  • the fifth control is used to trigger audio editing of the target audio.
  • the electronic device after acquiring the target audio, can also display the fifth control associated with the target audio on the third interface.
  • the fifth control 334 can trigger the execution of audio editing of the target audio.
  • the fifth control 334 can be a control imported into the audio track, used to trigger the import of the audio into a fourth interface (for example, the audio track interface). ) for audio editing.
  • the electronic device may perform an audio editing operation on the target audio in response to the touch operation on the fifth control 334.
  • audio editing may include one or more of the following: editing the audio to optimize the audio; separating the vocal and/or accompaniment from the audio; separating the vocal from the audio, and separating the mixing the vocal with the preset accompaniment; and separating the vocal from the first audio, the accompaniment from the second audio, and mixing the separated vocal with the separated accompaniment.
  • this embodiment does not limit the specific content of audio editing, which can be determined according to actual conditions, and will not be described again here.
  • the audio processing method provided by this embodiment obtains the audio to be processed by responding to the touch operation on the first control on the first interface, and obtains the audio to be processed in response to the touch operation on the second control on the second interface. Perform audio separation to obtain the target audio. The second control is used to trigger the separated audio. Finally, the audio graphics corresponding to the target audio and/or the audio graphics associated with the target audio can be displayed on the third interface for triggering playback. the third control of the target audio, and/or, display on the third interface a fourth control associated with the target audio for triggering the export of data associated with the target audio to the target location, and/or, on the A fifth control associated with the target audio for triggering audio editing of the target audio is displayed on the third interface.
  • audio uploading, audio processing and multiple ways of audio presentation are performed through controls on the interface, which enriches the audio processing functions of electronic devices, improves the intelligence of audio processing of electronic devices, and meets the personalized needs of users. , improving user experience.
  • audio editing of the target audio in S205 above may include the following steps: Steps:
  • one or more audio processing function controls are presented, and the one or more audio processing function controls are used to trigger execution of the corresponding audio processing function.
  • A2 In response to a touch operation on one of the one or more audio processing function controls, perform audio processing corresponding to the audio processing function control on the target audio to obtain the processed target audio.
  • the electronic device when the electronic device presents the acquired target audio on the third interface 33, and after the user plays the target audio through the third control 331 and listens to the target audio, it is determined that the target audio does not meet the requirements. , the user can also issue audio processing instructions to continue editing the target audio to obtain the processed target audio.
  • the electronic device when the electronic device receives the user's audio processing instruction, it can respond to it and present one or more audio processing function controls in order to detect the audio processing issued by the user by touching different audio processing function controls. Instructs, in turn, to perform different audio processing functions in response to detected operations.
  • the electronic device when the electronic device detects the user's touch operation on the fifth control 334 (for example, export to audio track in Figure 3) on the third interface, it starts the The third interface 33 jumps to the fourth interface, thereby displaying a plurality of controls related to audio editing on the fourth interface.
  • the fifth control 334 for example, export to audio track in Figure 3
  • the electronic device in response to the touch operation on the sixth control on the fourth interface, presents one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, where The seventh control is used to trigger the presentation of one or more audio processing function controls on the fifth interface.
  • presenting one or more audio processing function controls includes presenting one or more audio processing function controls in a window form, or presenting multiple audio processing function controls through a fifth interface.
  • FIG. 4 is a schematic interface diagram of another audio processing provided by an embodiment of the present disclosure.
  • a sixth control 411 is deployed on the fourth interface 41 .
  • the sixth control 411 may be designed as a control for triggering the presentation of one or more audio processing function controls. Therefore, when the user touches the sixth control 411 and the electronic device detects the touch operation on the sixth control 411, one or more audio processing function controls may be presented.
  • the electronic device may present a window on the fourth interface, and present one or more Audio processing function controls, or, as shown in (c) of FIG. 4 , one or more audio processing function controls are presented on the fifth interface 42 .
  • FIG. 5 is a schematic interface diagram of yet another audio processing provided by an embodiment of the present disclosure.
  • a sixth control 411 is deployed on the fourth interface 41 .
  • the sixth control 411 may be designed to trigger the presentation of a seventh control associated with one or more audio processing function controls. Therefore, when the user touches the sixth control 411 and the electronic device detects the touch operation on the sixth control 411, as shown in (b) of FIG. 5, a display related to one or more audio processes can be presented.
  • the seventh control 512 is associated with the function control.
  • the electronic device may present a window on the mixer interface 51, and present one or more Audio processing function controls, or, as shown in (d) of FIG. 5 , one or more audio processing function controls are presented on the fifth interface 42 .
  • the electronic device in response to the sliding operation on the fourth interface, presents one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, the seventh control being used to Trigger the presentation of one or more audio processing function controls on the fifth interface.
  • the sub-device may present one or more audio processing function controls directly in the form of a window or on the fifth interface.
  • the specific interface diagram can be seen in Figure 4.
  • the electronic device may present a seventh control associated with one or more audio processing function controls in response to a sliding operation on the fourth interface, and further in response to detecting a touch on the seventh control.
  • the operation can be performed directly in the form of a window or by presenting one or more audio processing function controls on the fifth interface.
  • the specific interface diagram can be seen in Figure 5.
  • the sixth control 411 which may also be called an interface switch button on the fourth interface 41 and the mixer interface 51, it is used to trigger In addition to switching between the track interface and the mixer interface), it can also include:
  • Metronome switch 412 used to trigger and set the metronome speed, time signature, input device, preparation beat, etc.
  • the headphone monitoring switch 413 is used to trigger the switch status of the headphones connected to the monitoring electronic device;
  • the track adding button 415 is used to trigger loading of a new track.
  • the fourth interface 41 may also support the following functions:
  • the sound control 513 is used to trigger the mute operation of the audio track.
  • the delete Control 514 is used to trigger the deletion operation on the audio track;
  • the mixer interface 51 can also support controlling the volume of the sub-track 515 and the total output channel 516; on the right side of the volume slider, there is also an effect control 517, which can be selected to enter by touching the effect control 517.
  • an effect control 517 which can be selected to enter by touching the effect control 517.
  • Below the effector button you can also select audio processing to unlock more audio processing methods, which will not be described here.
  • the fourth interface audio track interface
  • click to select the audio track waveform which can support the following operations : Audio split, audio cut, audio copy and clip deletion.
  • the above-mentioned audio processing function controls include:
  • Audio optimization control used to trigger editing of audio to optimize audio
  • Style detachment controls for triggering the separation of vocals and/or accompaniment from the audio
  • Audio mashup controls that trigger the separation of the vocal from the first audio, the accompaniment from the second audio, and the mixing and editing of the separated vocal with the separated accompaniment.
  • audio optimization may also be called playing and singing optimization, which is a solution for optimizing audio in terms of vocals and/or musical instruments.
  • audio optimization may include, but is not limited to, options including male guitar, female guitar, male piano, female piano, etc.
  • Accompaniment separation may include options to remove the vocal, remove the accompaniment, or separate the accompaniment (i.e., get the vocal and accompaniment after separation).
  • Style synthesis can also be called one-click remix, that is, the separated vocals can be mixed and edited with preset accompaniment.
  • the preset accompaniment may include, but is not limited to, different types including car songs, classic pop, heartbeat moments, relaxing moments, childhood fun, hip-hop backstreet, future bass, reggae style, tom drums, etc., and, The embodiments of this disclosure do not limit the names of each type. They can be named based on the user's needs, which will not be described again here.
  • Audio mashup is a solution for mixing and editing at least two pieces of audio. It can be a mixed editing of vocals and accompaniment, a mixed editing of at least two pieces of vocals, or a mixing of at least two accompaniments. Editor, the embodiment of the present disclosure does not limit the source audio used.
  • the electronic device may execute an audio processing function corresponding to the first audio processing function control in response to a touch operation on the first audio processing function control.
  • the first audio processing function control may be at least one group of controls among multiple types of controls such as audio optimization controls, accompaniment separation controls, style synthesis controls, and audio mashing controls.
  • the user is provided with a solution to jump from the accompaniment separation function interface to the audio processing function interface, saving paths, and can continue editing and creation, which can meet the user's diverse and personalized creation needs. Improved user experience.
  • the electronic device when the electronic device presents the acquired target audio on the third interface 33 and the user plays the target audio through the third control 331 and listens to the target audio, and determines that the target audio has met the requirements, the user
  • the audio export instruction can be issued through the fourth control 333 on the third interface 33 to export the target audio to a target location, for example, to a photo album or a file system.
  • data related to the target audio can be directly exported to the target location, where the data related to the target audio can include audio to be processed, audio to be executed.
  • the separated target audio may also be audio clips used in the audio processing process, etc., which will not be described here.
  • embodiments of the present disclosure also provide a function of adding a cover to the target audio. Therefore, in response to the touch operation on the fourth control 333 on the third interface 33, the interface can jump from the third interface 33 to the sixth interface, and the target audio is displayed on the sixth interface.
  • a cover in response to the interface editing instruction issued by the user on the sixth interface, a cover can be added to the generated target audio or the original cover can be changed.
  • the generated target cover in response to the detected save instruction, can be added to the original cover.
  • the data related to the target audio is saved to the target location; in response to the detected sharing instruction, the generated target cover and the data related to the target audio can be shared to the target application; in response to the detected import to audio track instruction, it can also be Import data related to the target audio into the audio track interface for users to continue editing.
  • jumping to the audio processing interface and presenting one or more audio processing function controls in response to the operation of the fifth control 334 on the third interface 33, Process the touch operation of one of the audio processing function controls, perform audio processing corresponding to the audio processing function control on the target audio to obtain the processed target audio, and then jump to when the export instruction is detected.
  • a sixth interface and display the processed target audio on the sixth interface.
  • FIG. 6 is a schematic diagram of another audio processing interface provided by an embodiment of the present disclosure.
  • the sixth interface 61 includes an eighth control 611 , which is used to trigger the playback of the processed target audio.
  • the sixth interface 61 also includes a ninth control 612 , that is, a control of the editing interface.
  • the ninth control 612 is used to trigger cover editing of the processed target audio.
  • the sixth interface 61 also includes an export control, an import to audio track control, and a sharing control.
  • the export control is used to export the data associated with the processed target audio to the target location
  • the import to audio track control is used to import the data associated with the processed target audio to the audio track interface for processing
  • the sharing control Used to share the data associated with the processed target audio to the target application, etc. It can be understood that this embodiment does not limit the controls included on the sixth interface and the functions of each control, and will not be described in detail here.
  • FIG. 7 is a schematic flowchart of yet another audio processing method provided by an embodiment of the present disclosure.
  • the audio processing method may also include the following steps:
  • the electronic device when the electronic device presents a ninth control 612 for triggering cover editing, the user can issue a cover editing instruction through the ninth control 612 .
  • the electronic device detects the user's touch operation on the ninth control 612, in response to the touch operation, the electronic device may present an interface as shown in (b) of FIG. 6 .
  • a window may be presented below the sixth interface 61.
  • this window is called a first window 613.
  • the first window 613 there is a cover part and an animation part. .
  • the cover part includes a customized cover import control and one or more preset static cover controls.
  • the cover import control is used to trigger the import of local pictures
  • one or more preset static cover controls are used to trigger the selection of preset static covers.
  • the static covers are multiple pictures preset in the target application of the electronic device, for example, cover 1, cover 2 and cover 3.
  • the animation part includes no animation controls and one or more preset animation effect controls.
  • the no animation control is used to trigger non-selection animation, that is, the cover generated by the electronic device has no animation effect.
  • One or more preset animation effect controls are used to trigger the selected preset animation effect.
  • the animation effects are a variety of dynamic changes preset in the target application of the electronic device.
  • the animation effects may include animation 1, animation 2, and animation 3.
  • the target cover In response to the control selection operation on the first window, obtain the target cover; the target cover is a static cover or a dynamic cover.
  • the user can select various controls presented on the sixth interface according to actual needs. For example, when the user touches the customized cover import control, the electronic device can use the photo imported locally as a static cover of the audio. When the user selects the non-animation control from the animation section, the generated target cover is a static cover.
  • a dynamic cover may be generated when the user selects a cover from the cover section and an animation from the animation section, respectively.
  • S702 can be implemented through the following steps:
  • the audio characteristics, static cover and animation effect of the processed target audio generate a dynamic cover that changes with the audio characteristics of the processed target audio; wherein the audio characteristics include audio beat and/or volume.
  • the electronic device can detect the user's control selection operation. For example, as shown in (b) of Figure 6, when the user selects cover 1 and animation 1, the electronic device will detect that in the first In the window 613, a selection operation is performed on the control corresponding to cover 1 and the control corresponding to animation 1.
  • a dynamic cover 620 as shown in (c) of Figure 6 can be generated.
  • the dynamic cover 620 can include cover 1 and animation. 1 corresponding animation special effects layer.
  • the electronic device when the user clicks the eighth control 611 below the dynamic cover 620 in Figure 6(c), the electronic device can play the processed video in response to the click operation on the eighth control 611.
  • Target audio and at this time the dynamic cover can change in real time according to audio characteristics such as audio beat and/or volume of the processed target audio.
  • the electronic device can also export the generated target cover and data related to the target audio in response to the user's operation.
  • exporting to an album or file is supported. And you can change the cover when exporting to the album. After the export is completed, you can choose to complete or share it to the target application.
  • users can also choose to share to files.
  • a compressed package containing the audio will be automatically generated for users to send to other places to continue editing.
  • the audio processing method may also include the following steps:
  • the export instruction may be voice, a touch operation of the export control, etc.
  • the voice recognition function on the sixth interface when the voice recognition function on the sixth interface is turned on, the user can issue export instructions through voice.
  • the sixth interface 61 also includes an export control 621.
  • the electronic device responds. Based on the touch operation on the export control 621, the data associated with the processed target audio can be exported to a target location, for example, to a photo album or a file system.
  • the audio processing method may also include the following steps:
  • the sharing instruction may be voice, a touch operation of the sharing control, etc.
  • the voice recognition function on the sixth interface is turned on, the user can give sharing instructions through voice.
  • the sixth interface 61 also includes a sharing control 622.
  • the electronic device responds Due to the touch operation on the sharing control 622, the data associated with the processed target audio can be shared to the target application, for example, a small video application, a mini program application, a chat application and other various applications.
  • the above-mentioned data associated with the processed target audio includes at least one of the following:
  • the processed target audio vocals, accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio.
  • the data associated with the processed target audio can be audio clips at various stages of audio processing, audio data (for example, vocals, accompaniment, etc.) and other materials, or can also be static files of the target audio.
  • Materials such as covers and dynamic covers of target audio can also be compressed packages, material packages, etc. compressed from multiple audio data. This embodiment does not limit the specific expression form of the data associated with the processed target audio.
  • the electronic device may share and/or export various data related to the processed target audio.
  • the electronic device can export and/or share the generated data related to the processed target audio based on the user's instructions, and can also export and/or export the processed target audio (vocal or accompaniment, etc.)
  • the generated target cover static cover or dynamic cover
  • the target audio can also be exported or shared together with the target audio, which is not limited in this embodiment.
  • FIG. 8 is a schematic flowchart of yet another audio processing method provided by an embodiment of the present disclosure.
  • the audio processing method provided by the embodiment of the present disclosure may include the following steps:
  • the electronic device can process the audio to be processed to obtain the target audio.
  • the electronic device can also upload the audio to be processed to the cloud in order to call a remote separation service to separate the target audio from the audio to be processed.
  • FIG. 9 is a schematic diagram of the implementation principle of accompaniment separation provided by an embodiment of the present disclosure.
  • the electronic device can first obtain the first video from the photo album, then extract the audio to be processed from the first video, and then upload the audio to be processed to Cloud, and by calling the remote separation service, audio separation is performed on the audio to be processed, thereby obtaining the separated target audio.
  • the electronic device can present the created audio track of the target audio on the interface.
  • the audio to be processed uploaded to the cloud is first transmitted to the video cloud, and then passes through the separated voice service in the cloud.
  • the target audio in the audio to be processed is separated and saved to the video cloud.
  • the electronic device interacts with the cloud and downloads the separated target audio from the video cloud in the cloud.
  • the electronic device can perform different processes in response to the user's touch operations on different controls.
  • the audio processing method may include:
  • the processed target audio and its related data can be compressed to obtain a file in the form of a compressed package for joint processing and storage.
  • the data related to the processed target audio when the data related to the processed target audio is saved to the album, it can support changing the cover of the target audio and other files or adding a cover by default to improve the aesthetic feeling when the user appreciates the target audio. .
  • the audio processing method may include:
  • data related to the target audio can be saved to a file system or photo album.
  • FIG. 10 is a schematic diagram of the implementation principle of audio file saving provided by an embodiment of the present disclosure.
  • the electronic device detects the user's save instruction, on the one hand, it first performs effect processing on the audio track of the target video in the form of audio blocks, and then synthesizes other components in the audio processing process.
  • the target cover static cover or dynamic cover
  • the audio processing method provided by the embodiments of the present disclosure provides the results of opening and outputting accompaniment separation to users, meets the diverse needs of users, and provides the ability to jump from the accompaniment separation function to the audio track.
  • the processing interface not only saves the interface jump path, but also provides the possibility to continue editing and creating the results of accompaniment separation, and provides a new saving method, which supports saving to files and saving to albums, and Supports changing the cover of files, improves the intelligence of applications applicable to audio processing methods, and improves user experience.
  • FIG 11 is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure.
  • the audio processing device 1100 can be integrated in an electronic device or implemented by an electronic device. Referring to Figure 11, the audio processing device 1100 may include:
  • the acquisition module 1101 is used to acquire the audio to be processed in response to the audio acquisition instruction
  • the processing module 1102 is configured to perform audio separation on the audio to be processed in response to the audio separation instruction for the audio to be processed, so as to obtain the target audio, wherein the target audio is separated from the audio to be processed.
  • Presentation module 1103 is used to present the target audio.
  • the acquisition module 1101 is specifically configured to acquire the audio to be processed in response to a touch operation on a first control on the first interface, wherein the first control Used to trigger loading audio.
  • the processing module 1102 is specifically configured to perform audio separation on the audio to be processed in response to a touch operation on the second control on the second interface to obtain the Target audio, the second control is used to trigger separate audio.
  • the presentation module 1103 is specifically configured to display audio graphics corresponding to the target audio and/or third controls associated with the target audio on a third interface. , the third control is used to trigger playback of the target audio.
  • the presentation module 1103 is specifically configured to display a fourth control associated with the target audio on the third interface, and the fourth control is used to trigger the Data associated with the target audio is exported to a target location; the target location includes a photo album or file system.
  • the presentation module 1103 is specifically configured to display a fifth control associated with the target audio on the third interface, and the fifth control is used to trigger the processing of the target audio. Audio for audio editing.
  • the presentation module 1103 is also configured to present one or more audio processing function controls in response to the audio processing instruction, and the one or more audio processing function controls are used to trigger execution Corresponding audio processing functions;
  • the processing module 1102 is further configured to, in response to a touch operation on one of the one or more audio processing function controls, execute audio corresponding to the audio processing function control on the target audio. Process to obtain the processed target audio.
  • the presentation module 1103 is specifically configured to respond to a touch operation for the sixth control on the fourth interface, present the one or more audio processing function controls or the and a seventh control associated with the one or more audio processing function controls, the seventh control being used to trigger the presentation of the one or more audio processing function controls on the fifth interface.
  • the presentation module 1103 is specifically configured to present the one or more audio processing function controls or the one or more audio processing function controls in response to the sliding operation on the fourth interface.
  • a seventh control associated with the processing function control is used to trigger the presentation of the one or more audio processing function controls on the fifth interface.
  • the audio processing function controls include:
  • Audio optimization controls for triggering editing of audio to optimize said audio
  • Style detachment controls for triggering the separation of vocals and/or accompaniment from the audio
  • Audio mashup controls that trigger the separation of the vocal from the first audio, the accompaniment from the second audio, and the mixing and editing of the separated vocal with the separated accompaniment.
  • the presentation module 1103 is also configured to display the processed target audio on a sixth interface.
  • the sixth interface includes an eighth control, and the eighth control uses Play the processed target audio on trigger.
  • the sixth interface further includes a ninth control
  • the presentation module 1103 is further configured to respond to a touch operation on the ninth control on the sixth interface. , display a first window, the first window including a cover import control, one or more preset static cover controls, and one or more preset animation effect controls;
  • the processing module 1102 is also configured to obtain a target cover in response to a control selection operation on the first window;
  • the target cover is a static cover or a dynamic cover.
  • the processing module 1102 is specifically used to:
  • the static cover and the animation effect generate a dynamic cover that changes with the audio characteristics of the processed target audio
  • the audio characteristics include audio beat and/or volume.
  • the processing module 1102 is also configured to export the data associated with the processed target audio to a target location in response to the export instruction for the sixth interface; the Destination locations include photo albums or file systems.
  • the processing module 1102 is also configured to share the data associated with the processed target audio to the target application in response to the sharing instruction for the sixth interface.
  • the data associated with the processed target audio includes at least one of the following:
  • the processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio are processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio.
  • the audio processing device provided in this embodiment can be used to execute the technical solutions of the above method embodiments. Its implementation principles and technical effects are similar, and will not be described again in this embodiment.
  • FIG. 12 is a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 1200 may be a terminal device or a server.
  • terminal devices may include but are not limited to mobile phones, laptops, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia players (Portable Mobile terminals such as Media Player (PMP for short), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs, desktop computers, etc.
  • PDA Personal Digital Assistant
  • PDA Personal Digital Assistant
  • PAD Personal Android Device
  • portable multimedia players Portable Mobile terminals such as Media Player (PMP for short
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 12 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 1200 may include a processing device (such as a central processing unit, a graphics processor, etc.) 1201, which may process data according to a program stored in a read-only memory (Read Only Memory, ROM for short) 1202 or from a storage device. 1208 loads the program in the random access memory (Random Access Memory, RAM for short) 1203 to perform various appropriate actions and processing. In the RAM 1203, various programs and data required for the operation of the electronic device 1200 are also stored.
  • the processing device 1201, ROM 1202 and RAM 1203 are connected to each other via a bus 1204.
  • An input/output (I/O for short) interface 1205 is also connected to bus 1204.
  • the following devices can be connected to the I/O interface 1205: input devices 1206 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD). ), an output device 1207 such as a speaker, a vibrator, etc.; a storage device 1208 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1209. Communication device 1209 may allow electronic device 1200 to communicate with other devices wirelessly or Wired communication to exchange data.
  • FIG. 12 illustrates electronic device 1200 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 1209, or from storage device 1208, or from ROM 1202.
  • the processing device 1201 When the computer program is executed by the processing device 1201, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
  • Computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, referred to as EPROM or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, referred to as CD-ROM), optical storage device, magnetic storage device, or the above any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device When the one or more programs are executed by the electronic device, the electronic device performs the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or it can be connected to an external computer Computer (e.g. connected via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two boxes represented one after another could actually be based on Although they are executed in parallel, they may sometimes be executed in reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the devices or modules described in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of a device does not constitute a limitation on the device or module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Products ( Application Specific Standard Parts (ASSP for short), System on Chip (SOC for short), Complex Programmable Logic Device (CPLD for short), etc.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Parts
  • SOC System on Chip
  • CPLD Complex Programmable Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • an audio processing method including:
  • obtaining audio to be processed in response to an audio acquisition instruction includes:
  • the audio to be processed is obtained in response to a touch operation on a first control on the first interface, where the first control is used to trigger loading of audio.
  • performing audio separation on the audio to be processed to obtain the target audio includes:
  • audio separation is performed on the audio to be processed to obtain the target audio, and the second control is used to trigger the separated audio.
  • presenting the target audio includes:
  • Audio graphics corresponding to the target audio and/or a third control associated with the target audio are displayed on the third interface, where the third control is used to trigger playing of the target audio.
  • presenting the target audio includes:
  • a fourth control associated with the target audio is displayed on the third interface, the fourth control is used to trigger the export of data associated with the target audio to a target location; the target location includes a photo album or a file system .
  • presenting the target audio includes:
  • a fifth control associated with the target audio is displayed on the third interface, and the fifth control is used to trigger audio editing of the target audio.
  • the audio editing of the target audio includes:
  • one or more audio processing function controls are presented, the one or more audio processing function controls are Function controls are used to trigger the execution of corresponding audio processing functions;
  • the presenting one or more audio processing function controls in response to the audio processing instruction includes:
  • the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls are presented, the seventh control The control is used to trigger the presentation of the one or more audio processing function controls on the fifth interface.
  • the presenting one or more audio processing function controls in response to the audio processing instruction includes:
  • the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls is presented, the seventh control is used to trigger the The one or more audio processing function controls are presented on the five interfaces.
  • the audio processing function controls include:
  • Audio optimization controls for triggering editing of audio to optimize said audio
  • Style detachment controls for triggering the separation of vocals and/or accompaniment from the audio
  • Audio mashup controls that trigger the separation of the vocal from the first audio, the accompaniment from the second audio, and the mixing and editing of the separated vocal with the separated accompaniment.
  • the method further includes: displaying the processed target audio on a sixth interface, the sixth interface including an eighth control, the eighth control being used to trigger playback The processed target audio.
  • the sixth interface further includes a ninth control
  • the method further includes:
  • a first window is displayed, the first window includes a cover import control, one or more preset static cover controls, and one or more Preset animation effect controls;
  • the target cover is a static cover or a dynamic cover.
  • obtaining the target cover in response to a control selection operation on the first window includes:
  • the static cover and the animation effect generate a dynamic cover that changes with the audio characteristics of the processed target audio
  • the audio characteristics include audio beat and/or volume.
  • the method further includes:
  • data associated with the processed target audio is exported to a target location; the target location includes a photo album or a file system.
  • the method further includes:
  • the data associated with the processed target audio includes at least one of the following:
  • the processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio are processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio.
  • an audio processing device including:
  • the acquisition module is used to acquire the audio to be processed in response to the audio acquisition instruction
  • a processing module configured to perform audio separation on the audio to be processed in response to an audio separation instruction for the audio to be processed, to obtain target audio, where the target audio is a person separated from the audio to be processed. vocals and/or accompaniment;
  • a rendering module used to render the target audio.
  • the acquisition module is specifically configured to acquire the audio to be processed in response to a touch operation on a first control on the first interface, wherein the first control uses Load audio on trigger.
  • the processing module is specifically configured to perform audio separation on the audio to be processed in response to a touch operation on the second control on the second interface to obtain the target Audio, the second control is used to trigger separate audio.
  • the presentation module is specifically configured to display audio graphics corresponding to the target audio and/or third controls associated with the target audio on a third interface,
  • the third control is used to trigger playback of the target audio.
  • the presentation module is specifically configured to display a fourth control associated with the target audio on a third interface, and the fourth control is used to trigger the Audio-related data is exported to a target location; the target location includes a photo album or file system.
  • the presentation module is specifically configured to display a fifth control associated with the target audio on a third interface, and the fifth control is used to trigger the processing of the target audio. Perform audio editing.
  • the presentation module is further configured to present one or more audio processing function controls in response to the audio processing instruction, and the one or more audio processing function controls are used to trigger execution of the corresponding audio processing functions;
  • the processing module is further configured to perform audio processing corresponding to the audio processing function control on the target audio in response to a touch operation on one of the one or more audio processing function controls. , to obtain the processed target audio.
  • the presentation module is specifically configured to respond to a touch operation for the sixth control on the fourth interface, present the one or more audio processing function controls or interact with the A seventh control associated with one or more audio processing function controls, the seventh control being used to trigger the presentation of the one or more audio processing function controls on the fifth interface.
  • the presentation module is specifically configured to present the one or more audio processing function controls or interact with the one or more audio processing functions in response to a sliding operation on the fourth interface.
  • the audio processing function controls include:
  • Audio optimization controls for triggering editing of audio to optimize said audio
  • Style detachment controls for triggering the separation of vocals and/or accompaniment from the audio
  • Audio mashup controls that trigger the separation of the vocal from the first audio, the accompaniment from the second audio, and the mixing and editing of the separated vocal with the separated accompaniment.
  • the presentation module is further configured to display the processed the target audio
  • the sixth interface includes an eighth control
  • the eighth control is used to trigger playback of the processed target audio.
  • the sixth interface further includes a ninth control
  • the presentation module is further configured to respond to a touch operation on the ninth control on the sixth interface, Display a first window, the first window including a cover import control, one or more preset static cover controls, and one or more preset animation effect controls;
  • the processing module is also configured to obtain a target cover in response to a control selection operation on the first window;
  • the target cover is a static cover or a dynamic cover.
  • the processing module is specifically used to:
  • the static cover and the animation effect generate a dynamic cover that changes with the audio characteristics of the processed target audio
  • the audio characteristics include audio beat and/or volume.
  • the processing module is further configured to export data associated with the processed target audio to a target location in response to an export instruction for the sixth interface;
  • the target Locations include photo albums or file systems.
  • the processing module is further configured to share data associated with the processed target audio to a target application in response to a sharing instruction for the sixth interface.
  • the data associated with the processed target audio includes at least one of the following:
  • the processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio are processed target audio, the vocal, the accompaniment, the static cover of the processed target audio, and the dynamic cover of the processed target audio.
  • an electronic device including: at least one processor and a memory;
  • the memory stores computer execution instructions
  • the at least one processor executes the computer execution instructions stored in the memory, so that the at least one processor executes the audio processing method described in the above first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium is provided.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • a processor executes the computer-executed instructions, Implement the audio processing method described in the above first aspect and various possible designs of the first aspect.
  • a computer program product including a computer program that, when executed by a processor, implements the above first aspect and various possible designs of the first aspect.
  • the audio processing method is provided, including a computer program that, when executed by a processor, implements the above first aspect and various possible designs of the first aspect.
  • a computer program which when executed by a processor implements the audio processing method described in the first aspect and various possible designs of the first aspect. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

本公开提供一种音频处理方法、装置、设备及存储介质,该方法包括:响应于音频获取指示,获取待处理音频,响应于针对待处理音频的音频分离指示,对待处理音频进行音频分离,以获取目标音频,其中,该目标音频为从待处理音频分离出的人声和/或伴奏;呈现所述目标音频。

Description

音频处理方法、装置、设备及存储介质
相关申请交叉引用
本申请要求于2022年05月07日提交中国专利局、申请号为202210495460.0、发明名称为“音频处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本申请涉及信息处理技术领域,尤其涉及一种音频处理方法、装置、设备及存储介质。
背景技术
随着计算机技术的不断发展以及人们个性化需求的不断增长,越来越多的用户开始不满足一成不变的媒体创作风格,而是希望能够创作出具有自己风格的媒体内容。音频编辑是用户对媒体内容进行编辑以创作具有风格媒体内容的典型方式。
现有的音频编辑功能有限,无法满足用户多样化、个性化的媒体创作需求,因而,亟需扩展出不同的音频编辑功能以满足用户多样化、个性化的需求。
发明内容
本申请实施例提供一种音频处理方法、装置、设备及存储介质,用于提高音频编辑功能的多样化,以满足用户的个性化需求。
第一方面,本公开实施例提供一种音频处理方法,包括:
响应于音频获取指示,获取待处理音频;
响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
呈现所述目标音频。
第二方面,本公开实施例提供一种音频处理装置,包括:
获取模块,用于响应于音频获取指示,获取待处理音频;
处理模块,用于响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
呈现模块,用于呈现所述目标音频。
第三方面,本公开实施例提供一种电子设备,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第六方面,本公开实施例提供一种计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是本公开实施例提供的一种音频处理方法的流程示意图;
图2是本公开实施例提供的另一种音频处理方法的流程示意图;
图3是本公开实施例提供的一种音频处理的界面示意图;
图4是本公开实施例提供的另一种音频处理的界面示意图;
图5是本公开实施例提供的再一种音频处理的界面示意图;
图6是本公开实施例提供的又一种音频处理的界面示意图;
图7是本公开实施例提供的再一种音频处理方法的流程示意图;
图8是本公开实施例提供的又一种音频处理方法的流程示意图;
图9是本公开实施例提供的一种伴奏分离的实现原理示意图;
图10是本公开实施例提供的一种音频文件保存的实现原理示意图;
图11为本公开实施例提供的一种音频处理装置的结构示意图;
图12为本公开实施例提供的电子设备的结构框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的实施例针对现有音频编辑功能无法满足用户多样化、个性化音频制作需求的问题,提出了一种音频处理方法,该方法不仅可以对音频进行分离处理,例如,分离出人声和/或伴奏,而且还可以将分离出的人声和/或伴奏呈现给用户,以供用户进行试听、保存、分享或后处理,能够满足用户多样化的需求,提高了用户使用体验。
本公开实施例提供的技术方案可应用于电子设备对音频进行处理的场景。其中,电子设备可以为具有音频处理功能的任意设备,可以是终端设备,也可以是服务器或者虚拟机等,还可以是一个或多个服务器和/或计算机等组成的分布式计算机系统等。其中,终端设备包括但不限于智能手机、笔记本电脑、台式电脑、平台电脑、车载设备、智能穿戴设备、智慧屏等,本公开实施例不作限定。服务器可以为普通服务器或者云服务器,云服务器又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。
值得说明的是,本公开的产品实现形态是包含在平台软件中,并部署在电子设备(也可以是计算云或移动终端等具有计算能力的硬件)上的程序代码。示例性的,本公开的程序代码可以存储在电子设备内部。运行时,程序代码运行于电子设备的主机内存和/或GPU内存。
本公开实施例中,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
下面,通过具体实施例对本公开的技术方案进行详细说明。需要说明的是,下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
本公开实施例提供一种音频处理方法、装置、设备及存储介质,通过响应于音频获取指示,获取待处理音频,响应于针对待处理音频的音频分离指示,对待处理音频进行音频分离,以获取目标音频,其中,该目标音频为从待处理音频分离出的人声和/或伴奏;呈现所述目标 音频,可以将直接分离出的人声和/或伴奏呈现给用户,以供用户播放、保存、分享或处理等,能够满足用户多样化的需求,提高了用户使用体验。
示例性的,图1是本公开实施例提供的一种音频处理方法的流程示意图。该方法以图1中的电子设备作为执行主体进行解释说明。如图1所示,该音频处理方法可以包括如下步骤:
S101、响应于音频获取指示,获取待处理音频。
在本公开的实施例中,当用户使用电子设备对音频进行处理时,可以向电子设备发出音频获取指示,以便电子设备响应于获取到的音频获取指示,获取待处理音频。
示例性的,音频获取指示可以是用户通过电子设备的人机交互界面发出的,例如,通过触控人机交互界面上的控件发出,还可以是通过语音方式发出的(此时,电子设备具有语音获取或播放等功能的控件),此处不作限定。
可选的,电子设备响应于检测或接收到的音频获取指示,可以从其他设备接收待处理音频,也可以从自身存储的数据库中读取待处理音频(此时,电子设备中部署有数据库),还可以从云端获取待处理音频。本公开实施例并不对待处理音频的获取方式进行限定,其可以根据实际场景确定,此处不作赘述。
可理解,在本公开的实施例中,电子设备获取到的待处理音频可以是经过预处理后的音频,例如,是电子设备对获取到的目标视频进行音频提取后得到的音频数据,也可以是未经处理的音频,本实施例不作限定。
S102、响应于针对待处理音频的音频分离指示,对待处理音频进行音频分离,以获取目标音频,其中,该目标音频为从待处理音频分离出的人声和/或伴奏。
示例性的,电子设备在获取到待处理音频时,用户便可以向电子设备发出音频分离指示,以便电子设备响应于该音频分离指示,对待处理音频进行音频分离,从待处理音频中分离出目标音频,进而得到从待处理音频分离出的人声和/或伴奏,即,该目标音频可以是人声和伴奏中的至少一种。
示例性的,电子设备可以获取用户通过人机交互界面发出的音频分离指示,也可以获取用户通过语音方式发出的音频分离指示,本实施例不作限定。
S103、呈现目标音频。
在本实施例中,电子设备从待处理音频中分离出目标音频后,便可以呈现该目标音频,以供用户播放、保存、分享和/或处理。
示例性的,电子设备可以将目标音频呈现在目标应用的界面上,该界面上部署有用户可以操作的控件,例如,保存控件、播放控件、处理控件等。可选的,处理控件用于触发在处理页面上呈现目标音频,该处理页面可以是用于执行音频处理的页面,在该页面上可以供用户进行各种音频编辑和/或处理,并输出最终的处理结果。
本公开实施例提供的音频处理方法,响应于音频获取指示,获取待处理音频,响应于针对该待处理音频的音频分离指示,对待处理音频进行音频分离,以获取目标音频,其中,该目标音频为从待处理音频分离出的人声和/或伴奏,最后呈现目标音频。该技术方案中,通过呈现分离出的目标音频,即向用户开放和输出伴奏分离结果的方案,能够使得用户根据需求可以选择对目标音频进行播放、保存、分享、处理等各种操作,满足了用户的个性化需求,提高了用户的使用体验。
为使读者更深刻地理解本公开的实现原理,现结合以下实施例进行进一步细化。
示例性的,在上述实施例的基础上,图2是本公开实施例提供的另一种音频处理方法的流程示意图。如图2所示,在本公开的实施例中,该音频处理方法可以包括如下步骤:
S201、响应于针对第一界面上的第一控件的触控操作,获取待处理音频。
在本公开的实施例中,假设待处理音频是电子设备响应于用户在第一界面上的触控操作而获取到的音频。即,在本实施例中,第一界面是音频上传的界面。
示例性的,图3是本公开实施例提供的一种音频处理的界面示意图。参照图3的(a)所示,假设该第一界面31是伴奏分离的上传界面,该第一界面31上部署有第一控件311,该第一控件用于触发加载音频。因而,在本实施例中,当用户触控第一界面31上的第一控件311时,电子设备会检测到该触控操作,并响应于针对该第一控件311的触控操作,便会从本地相册获取待处理音频,并呈现在第二界面32上,参见图3的(b)所示。
可理解,触控操作也可以解释为按压操作、触摸操作或点击操作等,按压操作可以是长按、短按或持续按压等。本实施例并不限定触控操作的具体含义。
示例性的,参照图3的(b)所示,当待处理音频上传之后,在第二界面31上的第一区域321中不仅包含待处理音频、用于触发播放待处理音频的播放控件322,还包括位于待处理音频下方的分离选项。
可选的,分离选项可以包括去除人声控件和去除伴奏控件,去除人声控件用于触发去除音频中的人声,去除伴奏控件用于触发去除音频中的伴奏。
在本实施例的一种可能设计中,该分离选项还可以包括伴奏分离控件(未示出),该伴奏分离控件可以用于触发将音频中的人声、伴奏等各种不同类型的音频分离出来,得到音频中的人声和伴奏等。本实施例不对其进行限定。
S202、响应于针对第二界面上的第二控件的触控操作,对待处理音频进行音频分离,以获取目标音频,该第二控件用于触发分离音频。
在本公开的实施例中,在电子设备获取到待处理音频后,便可以对待处理音频执行分离操作,以获取目标音频。
示例性的,参照图3的(b)所示,在第二界面32中,第一区域321的下方还包括用于触发分离音频的第二控件323。可选的,当电子设备检测到用户选定去除人声控件的分离选项后,若再检测到用户针对该第二控件323的触控操作,则响应于针对该第二控件323的触控操作,对待处理音频进行音频分离,从而得到去除人声后的伴奏,参见图3的(c)所示。
可理解,在本公开的实施例中,第一界面、第二界面以及后续的界面表示不同的界面,并没有先后之分。同理,第一控件、第二控件以及后续的控件也仅表示不同的控件,没有先后的顺序,例如,第二控件可以是第二界面上的第一控件等。
示例性的,在本公开实施例的一种可能设计中,上述S103可以通过下述S203实现:
S203、在第三界面上显示与该目标音频相对应的音频图形和/或与该目标音频相关联的第三控件,所述第三控件用于触发播放该目标音频。
示例性的,在本实施例的该种可能设计中,电子设备在获取到目标音频后,可以在第三界面上显示与该目标音频相对应的音频图形和/或与该目标音频相关联的第三控件,从而将目标音频呈现给用户。
示例性的,参照图3的(c)所示,第三界面33是第二界面32更新后的界面,该第三界面33的第一区域330可以包括分离处理之前的待处理音频和分离处理之后的目标音频。
可选的,在图3(c)的第一区域330中,存在用于触发播放该目标音频的第三控件331和与该目标音频相对应的音频图形332。例如,该音频图形332可以是目标音频的波形幅度包络图。
相应的,在用户触控第三控件331时,电子设备响应于针对第三控件331的触控操作, 可以播放该目标音频,并呈现出随目标音频的波形幅度变化的音频图形332。
示例性的,在本公开实施例的另一种可能设计中,上述S103可以通过下述S204实现:
S204、在第三界面上显示与目标音频相关联的第四控件,该第四控件用于触发将与目标音频相关联的数据导出到目标位置。
其中,该目标位置包括相册或文件系统。
示例性的,在本实施例的该种可能设计中,电子设备在获取到目标音频后,将目标音频呈现给用户的方式可以是在第三界面上显示与目标音频相关联的第四控件。
示例性的,参照图3的(c)所示,在第三界面33的在第一区域330的下方存在第四控件333。可选的,该第四控件333可以是导出控件,其用于触发将与目标音频相关的数据导出到相册或文件系统等目标位置。
相应的,在用户触控第四控件333时,电子设备响应于针对该第四控件333的触控操作,可以将目标音频导出到目标位置。
示例性的,电子设备导出目标音频时可以以音频格式导出到目标位置,也可以以文件格式导出到目标位置,本实施例不作限定。
示例性的,在本公开实施例的另一种可能设计中,上述S103可以通过下述S205实现:
S205、在第三界面上显示与目标音频相关联的第五控件,第五控件用于触发对目标音频进行音频编辑。
示例性的,在本实施例的该种可能设计中,电子设备在获取到目标音频后,还可以通过在第三界面显示与目标音频相关联的第五控件。
示例性的,参照图3的(c)所示,在第三界面33的在第一区域330的下方存在第五控件334。可选的,该第五控件334可以触发执行对目标音频进行音频编辑,例如,第五控件334可以是导入到音轨的控件,用于触发将音频导入到第四界面(例如,音轨界面)进行音频编辑。
相应的,在用户触控第五控件334时,电子设备响应于针对该第五控件334的触控操作,可以执行对目标音频进行音频编辑的操作。
可选的,在本实施例中,音频编辑可以包括以下中的一个或多个:对音频进行编辑以优化音频;从音频分离人声和/或伴奏;从音频分离人声,并将分离出的人声与预设伴奏进行混合;以及从第一音频分离人声,从第二音频分离伴奏,并将分离出的人声与分离出的伴奏进行混合。
可选的,本实施例并不限定音频编辑的具体内容,其可以根据实际情况确定,此处不作赘述。
本实施例提供的音频处理方法,通过响应于针对第一界面上的第一控件的触控操作,获取待处理音频,响应于针对第二界面上的第二控件的触控操作,对待处理音频进行音频分离,以获取目标音频,该第二控件用于触发分离音频,最后可以在第三界面上显示与该目标音频相对应的音频图形和/或与该目标音频相关联的用于触发播放该目标音频的第三控件,和/或,在第三界面上显示与目标音频相关联的用于触发将与目标音频相关联的数据导出到目标位置的第四控件,和/或,在第三界面上显示与目标音频相关联的用于触发对目标音频进行音频编辑的第五控件。该技术方案中,通过界面上的控件执行音频上传、音频处理以及多种方式的音频呈现,丰富了电子设备的音频处理功能,提高了电子设备的音频处理智能化,满足了用户的个性化需求,提高了用户体验。
可选的,在本公开的实施例中,上述S205中的对目标音频进行音频编辑可以包括如下步 骤:
A1、响应于音频处理指示,呈现一个或多个音频处理功能控件,该一个或多个音频处理功能控件用于触发执行相应的音频处理功能。
A2、响应于针对一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频。
可选的,在本步骤中,当电子设备将获取到的目标音频呈现在第三界面33上,用户通过第三控件331播放目标音频并试听目标音频之后,确定目标音频还未能满足要求时,用户还可以发出音频处理指示,以便继续对目标音频进行继续编辑,以得到处理后的目标音频。
示例性的,电子设备在接收到用户的音频处理指示时,可以对其作出响应,并呈现出一个或多个音频处理功能控件,以便检测用户通过触控不同的音频处理功能控件发出的音频处理指示,进而响应于检测到的操作,执行不同的音频处理功能。
可选的,在本实施例的一种可能设计中,电子设备检测到用户针对第三界面上第五控件334(例如,图3中的导出到音轨)的触控操作时,便从第三界面33跳转到第四界面,从而在第四界面上显示出多个与音频编辑相关的控件。
作为一种示例,电子设备响应于针对第四界面上的第六控件的触控操作,呈现一个或多个音频处理功能控件或与一个或多个音频处理功能控件相关联的第七控件,该第七控件用于触发在第五界面上呈现一个或多个音频处理功能控件。
可选的,呈现一个或多个音频处理功能控件包括通过窗口形式呈现一个或多个音频处理功能控件,或者,通过第五界面呈现多个音频处理功能控件。
在一种可能的设计中,图4是本公开实施例提供的另一种音频处理的界面示意图。如图4的(a)所示,第四界面41上部署有第六控件411。该第六控件411可以设计成用于触发一个或多个音频处理功能控件呈现的控件。因而,当用户触控该第六控件411,且电子设备检测到针对该第六控件411的触控操作时,便可以呈现一个或多个音频处理功能控件。
示例性的,电子设备响应于检测到针对第六控件411的触控操作时,如图4的(b)所示,可以在第四界面上呈现一个窗口,在该窗口上呈现一个或多个音频处理功能控件,或者,如图4的(c)所示,在第五界面42上呈现一个或多个音频处理功能控件。
在另一种可能的设计中,图5是本公开实施例提供的再一种音频处理的界面示意图。如图5的(a)所示,第四界面41上部署有第六控件411。该第六控件411可以设计成用于触发与一个或多个音频处理功能控件相关联的第七控件的呈现。因而,当用户触控该第六控件411,且电子设备检测到针对该第六控件411的触控操作时,如图5的(b)所示,便可以呈现出与一个或多个音频处理功能控件相关联的第七控件512。
示例性的,如图5的(b)所示,电子设备响应于检测到针对第六控件411的触控操作时,电子设备的界面从第四界面41跳转到调音台界面51,从而在调音台界面51的第一区域511呈现与一个或多个音频处理功能控件相关联的第七控件512。
相应的,响应于检测到针对第七控件512的触控操作,如图5的(c)所示,电子设备可以在调音台界面51上呈现一个窗口,在该窗口上呈现一个或多个音频处理功能控件,或者,如图5的(d)所示,在第五界面42上呈现一个或多个音频处理功能控件。
作为另一种示例,电子设备响应于针对第四界面的滑动操作,呈现一个或多个音频处理功能控件或与一个或多个音频处理功能控件相关联的第七控件,该第七控件用于触发在第五界面上呈现一个或多个音频处理功能控件。
在本公开的实施例的一种可能设计中,当用户对第四界面41发出滑动操作,相应的,电 子设备响应于针对第四界面41的滑动操作可以直接通过窗口形式或者在第五界面上呈现一个或多个音频处理功能控件。具体界面示意图可以参见图4所示。
在本公开实施例的另一种可能设计中,当用户对第四界面发出滑动操作(例如,左滑操作,相应的,当用户发出右滑操作时,可以从调音台界面51返回至第四界面41),相应的,电子设备响应于针对第四界面的滑动操作可以呈现出与一个或多个音频处理功能控件相关联的第七控件,进而响应于检测到针对第七控件的触控操作,可以直接通过窗口形式或者在第五界面上呈现一个或多个音频处理功能控件。具体界面示意图可以参见图5所示。
可选的,在本公开的实施例中,参见上述图4和图5所示,第四界面41和调音台界面51上除了第六控件411(也可以称为界面切换按钮,用于触发音轨界面和调音台界面的切换)外,还可以包括:
节拍器开关412,用于触发设置节拍器速度、拍号、输入设备以及预备拍等;
耳机监听开关413,用于触发监听电子设备所连接的耳机开关状态;
其他设置414;
轨道添加按钮415,用于触发加载新的轨道。
可理解,本公开实施并不限定各个界面上包括的控件类型以及功能,其可以根据实际需求设定,此处不作赘述。
示例性的,在第四界面41上还可支持如下功能:
支持音频编辑能力,例如,音频导入与录音,点击新建轨道按钮,便可新建录音轨道;
支持导入文件、相册与应用中的音频与视频,在导入时,在直接导入以外,还可以支持在导入前对音频进行伴奏分离与音频优化;
支持从第四界面41左滑可进入调音台界面51,且,在调音台界面51上存在声音控件513以及删除控件514,该声音控件513用于触发对音轨执行静音操作,该删除控件514用于触发对音轨执行删除操作;
还可支持通过界面下方的播控按钮来控制操作的撤回和恢复。
同时,在调音台界面51,还可以支持控制分轨515和总输出通道516的音量;在音量滑块右侧,还包括效果器控件517,通过触控该效果器控件517,可以选择进入效果器界面,在效果器界面可以选择需要的效果预制,并能对效果的应用程度作出修改,在效果器按钮的下方,还可以选择音频处理,解锁更多音频处理玩法,此处不作赘述。
进一步的,在本公开的实施例中,在完成了音频的各种生成处理,需要对音轨进行时长剪辑时,回到第四界面(音轨界面)点击选择音轨波形,可以支持以下操作:音频分割、音频剪切、音频复制和片段删除。
可选的,在空白轨道上长按,即可唤出粘贴按钮,可将剪切或复制的音频进行粘贴,此外,还支持拖动音频开头与结尾以改变音频时长。
可选的,在本公开的实施例中,参见上述的图4和/或图5所示,上述音频处理功能控件包括:
音频优化控件,用于触发对音频进行编辑以优化音频;
伴奏分离控件,用于触发从音频分离人声和/或伴奏;
风格合成控件,用于触发从音频分离人声,并将分离出的人声与预设伴奏进行混合和编辑;
音频混搭控件,用于触发从第一音频分离人声,从第二音频分离伴奏,并将分离出的人声与分离出的伴奏进行混合和编辑。
可选的,在本实施例中,音频优化也可以称为弹唱优化,其是对音频进行人声和/或乐器方面进行优化处理的方案。例如,参见图4和/或图5,音频优化可以包括但不局限于包括男声吉他、女声吉他、男声钢琴、女声钢琴等选项。
伴奏分离可以包括去除人声、去除伴奏或者伴奏分离(即,在分离后得到人声和伴奏)的选项。
风格合成也可以称为一键remix,即可以将分离出的人声与预设伴奏进行混合和编辑。可选的,预设伴奏可以包括但不局限与包括车载嗨歌、经典流行、心动瞬间、放松时刻、童年乐趣、嘻哈后街、未来低音、雷鬼风情、咚鼓等不同的类型,而且,本公开实施例也不限定各类型的名称,其可以基于用户的需求进行命名,此处不作赘述。
音频混搭(mashup)是将至少两段音频进行混合和编辑的方案,其可以是人声和伴奏的混合编辑,也可以是至少两段人声的混合编辑,还可以是至少两段伴奏的混合编辑,本公开实施例不对使用的源音频进行限定。
在本实施例中,电子设备可以响应于针对第一音频处理功能控件的触控操作,执行与该第一音频处理功能控件相对应的音频处理功能。其中,第一音频处理功能控件可以是音频优化控件、伴奏分离控件、风格合成控件、音频混搭控件等多种类型控件中的至少一组控件。
在本公开的实施例中,为用户提供了从伴奏分离功能界面跳转到音频处理功能界面的方案,节省了路径,并可继续编辑和创作,能够满足用户多样化、个性化的创作需求,提高了用户的使用体验。
在上述各实施例的基础上,当电子设备将获取到的目标音频呈现在第三界面33上,用户通过第三控件331播放目标音频并试听目标音频之后,确定目标音频已满足要求时,用户便可以通过第三界面33上的第四控件333发出音频导出指示,以便将目标音频导出到目标位置,例如,导出到相册或文件系统。
作为一种示例,响应于针对第三界面33上第四控件333的操作,可以直接将与目标音频相关的数据导出到目标位置,其中,与目标音频相关的数据可以包括待处理音频、执行音频分离得到的目标音频(伴奏和/或人声)等,还可以是音频处理过程中使用的音频片段等,此处不作赘述。
作为另一种示例,本公开实施例还提供了为目标音频添加封面的功能。因而,响应于针对第三界面33上第四控件333的触控操作,界面可以从第三界面33跳转到第六界面,并在第六界面上显示目标音频。
相应的,响应于用户在第六界面上发出的界面编辑指示,可以为生成的目标音频添加封面或者更改原有的封面,同理,响应于检测到的保存指示,可以将生成的目标封面和与目标音频相关的数据保存到目标位置;响应于检测到的分享指示,可以将生成的目标封面和与目标音频相关的数据分享到目标应用;响应于检测到的导入到音轨指示,还可以将与目标音频相关的数据导入到音轨界面,以供用户继续编辑。
可理解,本公开实施例并不限定在第六界面上的具体操作,其可以基于用户指示执行相应的操作,以实现不同的功能。
在本公开的一种可能设计中,响应于针对第三界面33上第五控件334的操作,跳转到音频处理界面并呈现一个或多个音频处理功能控件,响应于针对一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频,进而在检测到导出指示时,跳转到第六界面,并在第六界面上显示处理后的目标音频。
示例性的,图6是本公开实施例提供的又一种音频处理的界面示意图。如图6的(a)所示,该第六界面61包括第八控件611,该第八控件611用于触发播放处理后的目标音频。
可选的,在图6的(a)中,第六界面61还包括第九控件612,即编辑界面的控件,该第九控件612用于触发处理后的目标音频的封面编辑。
可选的,在图6的(a)中,第六界面61还包括导出控件、导入到音轨控件和分享控件。其中,导出控件用于将与处理后的目标音频相关联的数据导出到目标位置,导入到音轨控件用于将与处理后的目标音频相关联的数据导入到音轨界面进行处理,分享控件用于将与处理后的目标音频相关联的数据分享到目标应用等。可理解,本实施例并不限定第六界面上包括的控件以及各个控件的功能,此处不作赘述。
示例性的,图7是本公开实施例提供的再一种音频处理方法的流程示意图。如图7所示,在本公开的实施例中,该音频处理方法还可以包括如下步骤:
S701、响应于针对第六界面上的第九控件的触控操作,显示第一窗口,该第一窗口包括封面导入控件、一个或多个预设的静态封面控件以及一个或多个预设的动画效果控件。
在本公开的实施例中,当电子设备呈现出用于触发封面编辑的第九控件612时,用户可以通过该第九控件612发出封面编辑指示。例如,当电子设备检测到用户针对该第九控件612的触控操作时,响应于该触控操作,电子设备可以呈现如图6的(b)所示的界面。
参照图6的(b)所示,第六界面61的下方可以呈现一个窗口,在本实施例中,将该窗口称为第一窗口613,在该第一窗口613中存在封面部分和动画部分。
可选的,该封面部分包括一个自定义的封面导入控件,一个或多个预设的静态封面控件。其中,该封面导入控件用于触发导入本地图片,一个或多个预设的静态封面控件用于触发选择预置的静态封面。可理解,静态封面是预置在该电子设备的目标应用中的多张图片,例如,封面1、封面2和封面3。
可选的,该动画部分包括无动画控件以及一个或多个预设的动画效果控件。其中,无动画控件用于触发不选择动画,即电子设备生成的封面没有动画效果。一个或多个预设的动画效果控件用于触发选择预置的动画效果。可理解,动画效果是预置在该电子设备的目标应用中的多种动态变化形式,例如,动画效果可以包括动画1、动画2和动画3。
S702、响应于针对第一窗口上的控件选择操作,获取目标封面;该目标封面为静态封面或者动态封面。
在本实施例中,对于呈现在第六界面上的各种控件,用户可以根据实际需求进行选择。例如,当用户触控自定义的封面导入控件时,电子设备可以将从本地导入的相片作为音频的静态封面,当用户从动画部分选择无动画控件时,生成的目标封面则是静态封面。
作为另一种示例,当用户分别从封面部分选择封面和从动画部分选择动画时,则可以生成动态封面。具体的,在本公开的实施例中,若目标封面为动态封面,则该S702可以通过如下步骤实现:
B1、响应于针对该第一窗口上的控件选择操作,获取静态封面和动画效果。
B2、根据该处理后的目标音频的音频特征、静态封面和动画效果,生成随处理后的目标音频的音频特征变化的动态封面;其中,该音频特征包括音频节拍和/或音量。
可选的,在本实施例中,电子设备可以检测用户的控件选择操作,例如,如图6的(b)所示,当用户选择了封面1和动画1,电子设备会检测到在第一窗口613中针对封面1对应控件和动画1对应控件的选择操作,响应于该控件选择操作,可以生成如图6的(c)所示的动态封面620,该动态封面620可以包括封面1和动画1对应的动画特效图层。
可理解,在本公开的实施例中,当用户点击图6(c)中动态封面620下方的第八控件611时,电子设备响应于针对该第八控件611的点击操作,可以播放处理后的目标音频,且此时动态封面可以随处理后的目标音频的音频节拍和/或音量等音频特征进行实时变化。
可选的,当完成了最终的音频处理与剪辑操作,电子设备响应于用户的操作还可以将生成的目标封面和与目标音频相关的数据进行导出,可选的,支持导出到相册或者文件,并且导出到相册时可以更换封面,导出完成后可以选择完成或分享到目标应用。
此外,用户还可以选择分享到文件,此时,会自动生成一个包含音频的压缩包,方便用户发送到其他地方继续编辑。
可选的,在本公开的实施例中,在上述S702之后,该音频处理方法还可以包括如下步骤:
S703、响应于针对第六界面的导出指示,将与处理后的目标音频相关联的数据导出到目标位置;该目标位置包括相册或文件系统。
在本实施例中,导出指示可以是语音、导出控件的触控操作等。
例如,在第六界面上的语音识别功能开启时,用户可以通过语音方式发出导出指示。
再比如,参照图6的(a)和(c)所示,该第六界面61还包括导出控件621,相应的,当用户触摸或按压了第六界面上的导出控件621时,电子设备响应于针对该导出控件621的触控操作,可以将与处理后的目标音频相关联的数据导出到目标位置,例如,导出到相册或文件系统。
可选的,在本公开的实施例中,在上述S702之后,该音频处理方法还可以包括如下步骤:
S704、响应于针对第六界面上的分享控件的触控操作,将与处理后的目标音频相关联的数据分享到目标应用。
示例性的,在本实施例中,分享指示可以是语音、分享控件的触控操作等。例如,在第六界面上的语音识别功能开启时,用户可以通过语音方式发出分享指示。
再比如,参照图6的(a)和(c)所示,该第六界面61还包括分享控件622,相应的,当用户触摸或按压了第六界面上的分享控件622时,电子设备响应于针对该分享控件622的触控操作,可以将与处理后的目标音频相关联的数据分享到目标应用,例如,小视频应用程序或小程序应用或聊天应用等各种应用中。
可理解,在本公开的实施例中,上述的与处理后的目标音频相关联的数据包括以下至少一项:
处理后的目标音频,人声,伴奏,处理后的目标音频的静态封面、处理后的目标音频的动态封面。
可理解,在本实施例中,与处理后的目标音频相关联的数据可以是音频处理各个阶段的音频片段、音频数据(例如,人声、伴奏等)等素材,还可以是目标音频的静态封面、目标音频的动态封面等素材,也可以是由多个音频数据压缩成的压缩包、素材包等。本实施例不对与处理后的目标音频相关联的数据的具体表现形式进行限定。
示例性的,电子设备可以将与处理后的目标音频的各种相关数据进行分享和/或导出。例如,电子设备可以基于用户的指示,将生成的与处理后的目标音频相关的数据进行导出和/或分享,也可以将经过音频处理后的目标音频(人声或伴奏等)进行导出和/或分享,还可以将生成的目标封面(静态封面或动态封面)随目标音频一同导出或分享,本实施例并不对其进行限定。
在上述各实施例的基础上,图8是本公开实施例提供的又一种音频处理方法的流程示意图。如图8所示,本公开实施例提供的音频处理方法可以包括如下步骤:
S801、响应于检测到针对伴奏分离控件的触控操作,对待处理音频进行音频分离,得到目标音频。
作为一种示例,电子设备可以对待处理音频进行处理,得到目标音频。
作为另一种示例,电子设备还可以将待处理音频上传至云端,以便调用远程分离服务,从待处理音频中分离出目标音频。可选的,图9是本公开实施例提供的一种伴奏分离的实现原理示意图。如图9所示,在本实施例中,电子设备可以基于用户的选择操作,首先从相册获取第一视频,然后从第一视频中抽离出待处理音频,随后将该待处理音频上传到云端,并通过调用远程分离服务,对待处理音频进行音频分离,从而得到分离后的目标音频。进而,电子设备在创建音轨后,可以在界面上呈现出创建的目标音频的音轨。
具体的,如图9所示,上传到云端的待处理音频首先被传输到视频云,然后在云端经过分离语音服务,待处理音频中的目标音频被分离出来,并被保存至视频云,最后电子设备通过与云端进行交互,从云端的视频云下载得到分离出的目标音频。
可理解,在得到目标音频后,电子设备可以响应于用户针对不同控件的触控操作执行不同的流程。
作为一种示例,在S801之后,该音频处理方法可以包括:
S802、响应于检测到针对导出到音轨控件的触控操作,将目标音频导出到音轨界面进行后续编辑,得到处理后的目标音频。
S803、响应于检测到针对保存控件的触控操作,将与处理后的目标音频相关的数据保存到文件系统或相册。
示例性的,对于生成的音频文件,为了方便后续在其他设备进行编辑,可以将处理后的目标音频以及与其相关的数据进行压缩,得到压缩包形式的文件,以便共同处理和保存。
可选的,在本实施例中,当将与处理后的目标音频相关的数据保存到相册时,可以支持更换目标音频等文件的封面或默认添加封面,以提高用户欣赏该目标音频时的美感。
作为另一种示例,在S801之后,该音频处理方法可以包括:
S804、响应于检测到针对保存控件的触控操作,保存与目标音频相关的数据。
示例性的,可以将与目标音频相关的数据保存到文件系统或相册。
可选的,在上述S803和S804中,对与目标音频相关的数据进行保存的方式可以参照下述的图10所示。可选的,图10是本公开实施例提供的一种音频文件保存的实现原理示意图。如图10所示,在本实施例中,电子设备检测到用户的保存指示时,一方面,首先按照音频块的形式对目标视频的音轨进行效果器处理,然后合成音频处理过程中的其他音频轨并对合成后的结果进行渲染,随后对渲染的结果进行音频编码,输出音频文件;另一方面,响应于用户的封面选择操作,生成目标封面(静态封面或动态封面);最后,将音频文件和目标封面封装在一起,得到添加封面的目标音频。
关于本实施例中各步骤的具体实现可以参见上述各实施例中的记载,此处不作赘述。
由上述各实施例记载的内容可知,本公开实施例提供的音频处理方法提供了向用户开放和输出伴奏分离的结果,满足了用户多样化的需求,提供了从伴奏分离功能跳转到音轨处理的界面,不仅节省了界面跳转的路径,并且提供了对伴奏分离的结果进行继续编辑和创作的可能性,提供了一种新的保存方式,即支持保存到文件和保存到相册,而且支持更换文件的封面,提高了音频处理方法所适用应用程序的智能化,提高了用户的使用体验。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
图11为本公开实施例提供的一种音频处理装置的结构示意图。该音频处理装置1100可以集成在电子设备中,也可以通过电子设备实现。参照图11所示,该音频处理装置1100可以包括:
获取模块1101,用于响应于音频获取指示,获取待处理音频;
处理模块1102,用于响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
呈现模块1103,用于呈现所述目标音频。
在本公开的一个可选实施例中,所述获取模块1101,具体用于响应于针对第一界面上的第一控件的触控操作,获取所述待处理音频,其中,所述第一控件用于触发加载音频。
在本公开的一个可选实施例中,所述处理模块1102,具体用于响应于针对第二界面上的第二控件的触控操作,对所述待处理音频进行音频分离,以获取所述目标音频,所述第二控件用于触发分离音频。
在本公开的一个可选实施例中,所述呈现模块1103,具体用于在第三界面上显示与所述目标音频相对应的音频图形和/或与所述目标音频相关联的第三控件,所述第三控件用于触发播放所述目标音频。
在本公开的一个可选实施例中,所述呈现模块1103,具体用于在第三界面上显示与所述目标音频相关联的第四控件,所述第四控件用于触发将与所述目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
在本公开的一个可选实施例中,所述呈现模块1103,具体用于在第三界面上显示与所述目标音频相关联的第五控件,所述第五控件用于触发对所述目标音频进行音频编辑。
在本公开的一个可选实施例中,所述呈现模块1103,还用于响应于音频处理指示,呈现一个或多个音频处理功能控件,所述一个或多个音频处理功能控件用于触发执行相应的音频处理功能;
所述处理模块1102,还用于响应于针对所述一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对所述目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频。
在本公开的一个可选实施例中,所述呈现模块1103,具体用于响应于针对第四界面上的第六控件的触控操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
在本公开的一个可选实施例中,所述呈现模块1103,具体用于响应于针对第四界面的滑动操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
在本公开的一个可选实施例中,所述音频处理功能控件包括:
音频优化控件,用于触发对音频进行编辑以优化所述音频;
伴奏分离控件,用于触发从音频分离人声和/或伴奏;
风格合成控件,用于触发从音频分离人声,并将分离出的人声与预设伴奏进行混合和编辑;
音频混搭控件,用于触发从第一音频分离人声,从第二音频分离伴奏,并将分离出的人声与分离出的伴奏进行混合和编辑。
在本公开的一个可选实施例中,所述呈现模块1103,还用于在第六界面上显示所述处理后的目标音频,所述第六界面包括第八控件,所述第八控件用于触发播放所述处理后的目标音频。
在本公开的一个可选实施例中,所述第六界面还包括第九控件,所述呈现模块1103,还用于响应于针对所述第六界面上的所述第九控件的触控操作,显示第一窗口,所述第一窗口包括封面导入控件、一个或多个预设的静态封面控件以及一个或多个预设的动画效果控件;
所述处理模块1102,还用于响应于针对所述第一窗口上的控件选择操作,获取目标封面;
所述目标封面为静态封面或者动态封面。
在本公开的一个可选实施例中,若所述目标封面为动态封面,所述处理模块1102,具体用于:
响应于针对所述第一窗口上的控件选择操作,获取静态封面和动画效果;
根据所述处理后的目标音频的音频特征、所述静态封面和所述动画效果,生成随所述处理后的目标音频的音频特征变化的动态封面;
其中,所述音频特征包括音频节拍和/或音量。
在本公开的一个可选实施例中,所述处理模块1102,还用于响应于针对第六界面的导出指示,将与所述处理后的目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
在本公开的一个可选实施例中,所述处理模块1102,还用于响应于针对第六界面的分享指示,将与所述处理后的目标音频相关联的数据分享到目标应用。
在本公开的一个可选实施例中,所述与所述处理后的目标音频相关联的数据包括以下至少一项:
所述处理后的目标音频,所述人声,所述伴奏,所述处理后的目标音频的静态封面,和所述处理后的目标音频的动态封面。
本实施例提供的音频处理装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图12为本公开实施例提供的电子设备的结构框图。如图12所示,该电子设备1200可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图12示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图12所示,电子设备1200可以包括处理装置(例如中央处理器、图形处理器等)1201,其可以根据存储在只读存储器(Read Only Memory,简称ROM)1202中的程序或者从存储装置1208加载到随机访问存储器(Random Access Memory,简称RAM)1203中的程序而执行各种适当的动作和处理。在RAM 1203中,还存储有电子设备1200操作所需的各种程序和数据。处理装置1201、ROM 1202以及RAM 1203通过总线1204彼此相连。输入/输出(Input/Output,简称I/O)接口1205也连接至总线1204。
通常,以下装置可以连接至I/O接口1205:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1206;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置1207;包括例如磁带、硬盘等的存储装置1208;以及通信装置1209。通信装置1209可以允许电子设备1200与其他设备进行无线或 有线通信以交换数据。虽然图12示出了具有各种装置的电子设备1200,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1209从网络上被下载和安装,或者从存储装置1208被安装,或者从ROM 1202被安装。在该计算机程序被处理装置1201执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,简称CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基 本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的装置或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,装置的名称在某种情况下并不构成对该装置或模块本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、专用标准产品(Application Specific Standard Parts,简称ASSP)、片上系统(System on Chip,简称SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,简称CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供一种音频处理方法,包括:
响应于音频获取指示,获取待处理音频;
响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
呈现所述目标音频。
根据本公开的一个或多个实施例,所述响应于音频获取指示,获取待处理音频,包括:
响应于针对第一界面上的第一控件的触控操作,获取所述待处理音频,其中,所述第一控件用于触发加载音频。
根据本公开的一个或多个实施例,所述响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,包括:
响应于针对第二界面上的第二控件的触控操作,对所述待处理音频进行音频分离,以获取所述目标音频,所述第二控件用于触发分离音频。
根据本公开的一个或多个实施例,所述呈现所述目标音频,包括:
在第三界面上显示与所述目标音频相对应的音频图形和/或与所述目标音频相关联的第三控件,所述第三控件用于触发播放所述目标音频。
根据本公开的一个或多个实施例,所述呈现所述目标音频,包括:
在第三界面上显示与所述目标音频相关联的第四控件,所述第四控件用于触发将与所述目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
根据本公开的一个或多个实施例,所述呈现所述目标音频,包括:
在第三界面上显示与所述目标音频相关联的第五控件,所述第五控件用于触发对所述目标音频进行音频编辑。
根据本公开的一个或多个实施例,所述对所述目标音频进行音频编辑包括:
响应于音频处理指示,呈现一个或多个音频处理功能控件,所述一个或多个音频处理功 能控件用于触发执行相应的音频处理功能;
响应于针对所述一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对所述目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频。
根据本公开的一个或多个实施例,所述响应于音频处理指示,呈现一个或多个音频处理功能控件,包括:
响应于针对第四界面上的第六控件的触控操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
根据本公开的一个或多个实施例,所述响应于音频处理指示,呈现一个或多个音频处理功能控件,包括:
响应于针对第四界面的滑动操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
根据本公开的一个或多个实施例,所述音频处理功能控件包括:
音频优化控件,用于触发对音频进行编辑以优化所述音频;
伴奏分离控件,用于触发从音频分离人声和/或伴奏;
风格合成控件,用于触发从音频分离人声,并将分离出的人声与预设伴奏进行混合和编辑;
音频混搭控件,用于触发从第一音频分离人声,从第二音频分离伴奏,并将分离出的人声与分离出的伴奏进行混合和编辑。
根据本公开的一个或多个实施例,所述方法还包括:在第六界面上显示所述处理后的目标音频,所述第六界面包括第八控件,所述第八控件用于触发播放所述处理后的目标音频。
根据本公开的一个或多个实施例,所述第六界面还包括第九控件,所述方法还包括:
响应于针对所述第六界面上的所述第九控件的触控操作,显示第一窗口,所述第一窗口包括封面导入控件、一个或多个预设的静态封面控件以及一个或多个预设的动画效果控件;
响应于针对所述第一窗口上的控件选择操作,获取目标封面;
所述目标封面为静态封面或者动态封面。
根据本公开的一个或多个实施例,若所述目标封面为动态封面,所述响应于针对所述第一窗口上的控件选择操作,获取目标封面,包括:
响应于针对所述第一窗口上的控件选择操作,获取静态封面和动画效果;
根据所述处理后的目标音频的音频特征、所述静态封面和所述动画效果,生成随所述处理后的目标音频的音频特征变化的动态封面;
其中,所述音频特征包括音频节拍和/或音量。
根据本公开的一个或多个实施例,所述方法还包括:
响应于针对第六界面的导出指示,将与所述处理后的目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
根据本公开的一个或多个实施例,所述方法还包括:
响应于针对第六界面的分享指示,将与所述处理后的目标音频相关联的数据分享到目标应用。
根据本公开的一个或多个实施例,所述与所述处理后的目标音频相关联的数据包括以下至少一项:
所述处理后的目标音频,所述人声,所述伴奏,所述处理后的目标音频的静态封面,和所述处理后的目标音频的动态封面。
第二方面,根据本公开的一个或多个实施例,提供一种音频处理装置,包括:
获取模块,用于响应于音频获取指示,获取待处理音频;
处理模块,用于响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
呈现模块,用于呈现所述目标音频。
根据本公开的一个或多个实施例,所述获取模块,具体用于响应于针对第一界面上的第一控件的触控操作,获取所述待处理音频,其中,所述第一控件用于触发加载音频。
根据本公开的一个或多个实施例,所述处理模块,具体用于响应于针对第二界面上的第二控件的触控操作,对所述待处理音频进行音频分离,以获取所述目标音频,所述第二控件用于触发分离音频。
根据本公开的一个或多个实施例,所述呈现模块,具体用于在第三界面上显示与所述目标音频相对应的音频图形和/或与所述目标音频相关联的第三控件,所述第三控件用于触发播放所述目标音频。
根据本公开的一个或多个实施例,所述呈现模块,具体用于在第三界面上显示与所述目标音频相关联的第四控件,所述第四控件用于触发将与所述目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
根据本公开的一个或多个实施例,所述呈现模块,具体用于在第三界面上显示与所述目标音频相关联的第五控件,所述第五控件用于触发对所述目标音频进行音频编辑。
根据本公开的一个或多个实施例,所述呈现模块,还用于响应于音频处理指示,呈现一个或多个音频处理功能控件,所述一个或多个音频处理功能控件用于触发执行相应的音频处理功能;
所述处理模块,还用于响应于针对所述一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对所述目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频。
根据本公开的一个或多个实施例,所述呈现模块,具体用于响应于针对第四界面上的第六控件的触控操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
根据本公开的一个或多个实施例,所述呈现模块,具体用于响应于针对第四界面的滑动操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
根据本公开的一个或多个实施例,所述音频处理功能控件包括:
音频优化控件,用于触发对音频进行编辑以优化所述音频;
伴奏分离控件,用于触发从音频分离人声和/或伴奏;
风格合成控件,用于触发从音频分离人声,并将分离出的人声与预设伴奏进行混合和编辑;
音频混搭控件,用于触发从第一音频分离人声,从第二音频分离伴奏,并将分离出的人声与分离出的伴奏进行混合和编辑。
根据本公开的一个或多个实施例,所述呈现模块,还用于在第六界面上显示所述处理后 的目标音频,所述第六界面包括第八控件,所述第八控件用于触发播放所述处理后的目标音频。
根据本公开的一个或多个实施例,所述第六界面还包括第九控件,所述呈现模块,还用于响应于针对所述第六界面上的所述第九控件的触控操作,显示第一窗口,所述第一窗口包括封面导入控件、一个或多个预设的静态封面控件以及一个或多个预设的动画效果控件;
所述处理模块,还用于响应于针对所述第一窗口上的控件选择操作,获取目标封面;
所述目标封面为静态封面或者动态封面。
根据本公开的一个或多个实施例,若所述目标封面为动态封面,所述处理模块,具体用于:
响应于针对所述第一窗口上的控件选择操作,获取静态封面和动画效果;
根据所述处理后的目标音频的音频特征、所述静态封面和所述动画效果,生成随所述处理后的目标音频的音频特征变化的动态封面;
其中,所述音频特征包括音频节拍和/或音量。
根据本公开的一个或多个实施例,所述处理模块,还用于响应于针对第六界面的导出指示,将与所述处理后的目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
根据本公开的一个或多个实施例,所述处理模块,还用于响应于针对第六界面的分享指示,将与所述处理后的目标音频相关联的数据分享到目标应用。
根据本公开的一个或多个实施例,所述与所述处理后的目标音频相关联的数据包括以下至少一项:
所述处理后的目标音频,所述人声,所述伴奏,所述处理后的目标音频的静态封面,和所述处理后的目标音频的动态封面。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:至少一个处理器和存储器;
所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第五方面,根据本公开的一个或多个实施例,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
第六方面,根据本公开的一个或多个实施例,提供一种计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的音频处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的 特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (21)

  1. 一种音频处理方法,包括:
    响应于音频获取指示,获取待处理音频;
    响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
    呈现所述目标音频。
  2. 根据权利要求1所述的方法,其中,所述响应于音频获取指示,获取待处理音频,包括:
    响应于针对第一界面上的第一控件的触控操作,获取所述待处理音频,其中,所述第一控件用于触发加载音频。
  3. 根据权利要求1或2所述的方法,其中,所述响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,包括:
    响应于针对第二界面上的第二控件的触控操作,对所述待处理音频进行音频分离,以获取所述目标音频,所述第二控件用于触发分离音频。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述呈现所述目标音频,包括:
    在第三界面上显示与所述目标音频相对应的音频图形和/或与所述目标音频相关联的第三控件,所述第三控件用于触发播放所述目标音频。
  5. 根据权利要求1至3中任一项所述的方法,其中,所述呈现所述目标音频,包括:
    在第三界面上显示与所述目标音频相关联的第四控件,所述第四控件用于触发将与所述目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
  6. 根据权利要求1至3中任一项所述的方法,其中,所述呈现所述目标音频,包括:
    在第三界面上显示与所述目标音频相关联的第五控件,所述第五控件用于触发对所述目标音频进行音频编辑。
  7. 根据权利要求6所述的方法,其中,所述对所述目标音频进行音频编辑包括:
    响应于音频处理指示,呈现一个或多个音频处理功能控件,所述一个或多个音频处理功能控件用于触发执行相应的音频处理功能;
    响应于针对所述一个或多个音频处理功能控件中的一个音频处理功能控件的触控操作,对所述目标音频执行与所述音频处理功能控件对应的音频处理,以获取处理后的目标音频。
  8. 根据权利要求7所述的方法,其中,所述响应于音频处理指示,呈现一个或多个音频处理功能控件,包括:
    响应于针对第四界面上的第六控件的触控操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
  9. 根据权利要求7所述的方法,其中,所述响应于音频处理指示,呈现一个或多个音频处理功能控件,包括:
    响应于针对第四界面的滑动操作,呈现所述一个或多个音频处理功能控件或与所述一个或多个音频处理功能控件相关联的第七控件,所述第七控件用于触发在第五界面上呈现所述一个或多个音频处理功能控件。
  10. 根据权利要求7至9中任一项所述的方法,其中,所述音频处理功能控件包括:
    音频优化控件,用于触发对音频进行编辑以优化所述音频;
    伴奏分离控件,用于触发从音频分离人声和/或伴奏;
    风格合成控件,用于触发从音频分离人声,并将分离出的人声与预设伴奏进行混合和编辑;
    音频混搭控件,用于触发从第一音频分离人声,从第二音频分离伴奏,并将分离出的人 声与分离出的伴奏进行混合和编辑。
  11. 根据权利要求7至10中任一项所述的方法,还包括:在第六界面上显示所述处理后的目标音频,所述第六界面包括第八控件,所述第八控件用于触发播放所述处理后的目标音频。
  12. 根据权利要求11所述的方法,其中,所述第六界面还包括第九控件,所述方法还包括:
    响应于针对所述第六界面上的所述第九控件的触控操作,显示第一窗口,所述第一窗口包括封面导入控件、一个或多个预设的静态封面控件以及一个或多个预设的动画效果控件;
    响应于针对所述第一窗口上的控件选择操作,获取目标封面;
    所述目标封面为静态封面或者动态封面。
  13. 根据权利要求12所述的方法,其中,若所述目标封面为动态封面,所述响应于针对所述第一窗口上的控件选择操作,获取目标封面,包括:
    响应于针对所述第一窗口上的控件选择操作,获取静态封面和动画效果;
    根据所述处理后的目标音频的音频特征、所述静态封面和所述动画效果,生成随所述处理后的目标音频的音频特征变化的动态封面;
    其中,所述音频特征包括音频节拍和/或音量。
  14. 根据权利要求7至13中任一项所述的方法,其中,所述方法还包括:
    响应于针对第六界面的导出指示,将与所述处理后的目标音频相关联的数据导出到目标位置;所述目标位置包括相册或文件系统。
  15. 根据权利要求7至14中任一项所述的方法,其中,所述方法还包括:
    响应于针对第六界面上的分享指示,将与所述处理后的目标音频相关联的数据分享到目标应用。
  16. 根据权利要求14或15所述的方法,其中,所述与所述处理后的目标音频相关联的数据包括以下至少一项:
    所述处理后的目标音频,所述人声,所述伴奏,所述处理后的目标音频的静态封面,和所述处理后的目标音频的动态封面。
  17. 一种音频处理装置,包括:
    获取模块,用于响应于音频获取指示,获取待处理音频;
    处理模块,用于响应于针对所述待处理音频的音频分离指示,对所述待处理音频进行音频分离,以获取目标音频,其中,所述目标音频为从所述待处理音频分离出的人声和/或伴奏;
    呈现模块,用于呈现所述目标音频。
  18. 一种电子设备,包括:处理器和存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行如权利要求1至16中任一项所述的音频处理方法。
  19. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至16中任一项所述的音频处理方法。
  20. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1至16中任一项所述的方法。
  21. 一种计算机程序,所述计算机程序被处理器执行时实现如权利要求1至16中任一项所述的方法。
PCT/CN2023/092363 2022-05-07 2023-05-05 音频处理方法、装置、设备及存储介质 WO2023216999A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210495460.0 2022-05-07
CN202210495460.0A CN117059121A (zh) 2022-05-07 2022-05-07 音频处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023216999A1 true WO2023216999A1 (zh) 2023-11-16

Family

ID=88652386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092363 WO2023216999A1 (zh) 2022-05-07 2023-05-05 音频处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN117059121A (zh)
WO (1) WO2023216999A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111916039A (zh) * 2019-05-08 2020-11-10 北京字节跳动网络技术有限公司 音乐文件的处理方法、装置、终端及存储介质
CN112885318A (zh) * 2019-11-29 2021-06-01 阿里巴巴集团控股有限公司 多媒体数据生成方法、装置、电子设备及计算机存储介质
CN113411516A (zh) * 2021-05-14 2021-09-17 北京达佳互联信息技术有限公司 视频处理方法、装置、电子设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111916039A (zh) * 2019-05-08 2020-11-10 北京字节跳动网络技术有限公司 音乐文件的处理方法、装置、终端及存储介质
CN112885318A (zh) * 2019-11-29 2021-06-01 阿里巴巴集团控股有限公司 多媒体数据生成方法、装置、电子设备及计算机存储介质
CN113411516A (zh) * 2021-05-14 2021-09-17 北京达佳互联信息技术有限公司 视频处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN117059121A (zh) 2023-11-14

Similar Documents

Publication Publication Date Title
WO2021073315A1 (zh) 视频文件的生成方法、装置、终端及存储介质
WO2020113733A1 (zh) 动画生成方法、装置、电子设备及计算机可读存储介质
CN113365134B (zh) 音频分享方法、装置、设备及介质
WO2022042035A1 (zh) 视频制作方法、装置、设备及存储介质
CN109495790A (zh) 基于编辑器的贴纸添加方法、装置、电子设备及可读介质
JP2019015951A (ja) 電子機器のウェイクアップ方法、装置、デバイス及びコンピュータ可読記憶媒体
WO2023051293A1 (zh) 一种音频处理方法、装置、电子设备和存储介质
US11934632B2 (en) Music playing method and apparatus
WO2023051246A1 (zh) 视频录制方法、装置、设备及存储介质
WO2022160603A1 (zh) 歌曲的推荐方法、装置、电子设备及存储介质
US20240103802A1 (en) Method, apparatus, device and medium for multimedia processing
US20200413003A1 (en) Method and device for processing multimedia information, electronic equipment and computer-readable storage medium
WO2024078293A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023216999A1 (zh) 音频处理方法、装置、设备及存储介质
WO2024032635A1 (zh) 媒体内容获取方法、装置、设备、可读存储介质及产品
US9705953B2 (en) Local control of digital signal processing
WO2024077498A1 (zh) 一种播放界面的显示方法、装置、设备及可读存储介质
WO2023217002A1 (zh) 音频处理方法、装置、设备及存储介质
WO2022237463A1 (zh) 直播背景音处理方法、装置、设备、介质及程序产品
WO2023217003A1 (zh) 音频处理方法、装置、设备及存储介质
JP2012058877A (ja) プレイリスト作成装置
WO2024012257A1 (zh) 音频处理方法、装置及电子设备
WO2022252916A1 (zh) 特效配置文件的生成方法、装置、设备及介质
JP7277635B2 (ja) イメージに対する音声合成に基づいて映像コンテンツを生成する方法およびシステム
EP4365888A1 (en) Method and apparatus for processing audio data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802771

Country of ref document: EP

Kind code of ref document: A1