CN111435600B - Method and apparatus for processing audio - Google Patents

Method and apparatus for processing audio Download PDF

Info

Publication number
CN111435600B
CN111435600B CN201910037108.0A CN201910037108A CN111435600B CN 111435600 B CN111435600 B CN 111435600B CN 201910037108 A CN201910037108 A CN 201910037108A CN 111435600 B CN111435600 B CN 111435600B
Authority
CN
China
Prior art keywords
audio
dubbing
track
volume
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910037108.0A
Other languages
Chinese (zh)
Other versions
CN111435600A (en
Inventor
思磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910037108.0A priority Critical patent/CN111435600B/en
Priority to PCT/CN2019/127603 priority patent/WO2020147522A1/en
Publication of CN111435600A publication Critical patent/CN111435600A/en
Application granted granted Critical
Publication of CN111435600B publication Critical patent/CN111435600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data
    • G11B2020/10555Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account
    • G11B2020/10574Audio or video recording specifically adapted for audio data wherein the frequency, the amplitude, or other characteristics of the audio signal is taken into account volume or amplitude

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

Embodiments of the present disclosure disclose methods and apparatus for processing audio. One embodiment of the method comprises: in response to the detection of a starting dubbing signal triggered by a user, adjusting the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume; acquiring an audio signal to be recorded, and recording the audio signal to be recorded on a dubbing audio track; and in response to the detection of the ending dubbing signal, storing the audio signal to be recorded on the dubbing audio track in the dubbing time period at the first target volume, and storing the audio signal to be recorded on other audio tracks except the dubbing audio track, which are included in the audio to be dubbed in the dubbing time period, at the second target volume. The method and the device can dub without modifying the original audio to be dubbed, and are favorable for flexibly dubbing the audio to be dubbed and modifying the dubbing.

Description

Method and apparatus for processing audio
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and apparatus for processing audio.
Background
With the development of internet technology, people can use electronic devices such as mobile phones and tablet computers to obtain contents such as videos and audios from internet resources, and can record the videos and audios and dub the videos and the audios. When people dub the original audio, the prior art generally adopts a method of directly mixing the user's voice with the original audio or replacing the user's voice with a certain segment of the original audio.
Disclosure of Invention
Embodiments of the present disclosure propose methods and apparatuses for processing audio.
In a first aspect, an embodiment of the present disclosure provides a method for processing audio, the method including: in response to the detection of a dubbing starting signal triggered by a user, adjusting the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume, wherein the dubbing audio track is an audio track which is added to the audio to be dubbed in advance and has a preset volume; acquiring an audio signal to be recorded, and recording the audio signal to be recorded on a dubbing audio track; and in response to the detection of the ending dubbing signal, storing the audio signal to be recorded on the dubbing audio track in the dubbing time period at the first target volume, and storing the audio signal to be recorded on other audio tracks except the dubbing audio track, which are included in the audio to be dubbed in the dubbing time period, at the second target volume.
In some embodiments, after the audio signal to be recorded on a track other than the dubbing track included in the audio to be dubbed for the dubbing period is held at the second target volume, the method further includes: and adjusting the volume of the dubbing audio track to a preset volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to an initial volume.
In some embodiments, after the audio signal to be recorded on a track other than the dubbing track included in the audio to be dubbed for the dubbing period is held at the second target volume, the method further includes: in response to detecting a dubbing signal triggered by a user, displaying an interface for modifying an audio signal to be recorded on a dubbing audio track; and in response to detecting the end of the user trigger, modifying the dubbing signal, and saving the modified audio signal to be recorded on the dubbing audio track.
In some embodiments, the modifying operation comprises at least one of: deleting operation, cutting operation and re-recording operation.
In some embodiments, the first target volume and the second target volume are each a preset volume or a volume adjusted by a user.
In a second aspect, an embodiment of the present disclosure provides an apparatus for processing audio, the apparatus including: the adjusting unit is configured to respond to the detection of a starting dubbing signal triggered by a user, adjust the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjust the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume, wherein the dubbing audio track is a preset volume audio track which is added to the audio to be dubbed in advance; a recording unit configured to acquire an audio signal to be recorded and record the audio signal to be recorded on a dubbing track; and a holding unit configured to hold the audio signal to be recorded on the dubbing track at a first target volume for a dubbing period and hold the audio signal to be recorded on a track other than the dubbing track included in the audio to be dubbed in the dubbing period at a second target volume in response to detection of the ending dubbing signal.
In some embodiments, the saving unit is further configured to: and adjusting the volume of the dubbing audio track to a preset volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to an initial volume.
In some embodiments, the saving unit comprises: the display module is configured to respond to the detection of a modified dubbing signal triggered by a user, and display an interface for modifying an audio signal to be recorded on a dubbing audio track; a saving module configured to save the modified audio signal to be recorded on the dubbing track in response to detecting the user-triggered end modification of the dubbing signal.
In some embodiments, the modifying operation comprises at least one of: deleting operation, cutting operation and re-recording operation.
In some embodiments, the first target volume and the second target volume are each a preset volume or a volume adjusted by a user.
In a third aspect, an embodiment of the present disclosure provides a terminal device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.
The method and the device for processing audio provided by the embodiment of the disclosure adjust the volume of the dubbing track on the audio to be dubbed to a first target volume in response to detecting a start dubbing signal triggered by a user, adjust the volume of other tracks except the dubbing track included in the audio to be dubbed to a second target volume, acquire the audio signal to be recorded again, record the audio signal to be recorded on the dubbing track, save the audio signal to be recorded on the dubbing track in the dubbing time period at the first target volume in response to detecting an end dubbing signal, save the audio signal to be recorded on other tracks except the dubbing track included in the audio to be dubbed in the dubbing time period at the second target volume, thereby enabling dubbing without modifying the original audio to be dubbed by adopting a means of adding tracks to the audio to be dubbed, the first target volume and the second target volume are set, so that the recorded audio signal to be recorded and the original audio to be dubbed can be better fused, and flexible dubbing and modification of the audio to be dubbed are facilitated.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for processing audio, according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of one application scenario of a method for processing audio according to an embodiment of the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a method for processing audio according to an embodiment of the present disclosure;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for processing audio in accordance with the present disclosure;
fig. 6 is a schematic structural diagram of a terminal device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 of a method for processing audio or an apparatus for processing audio to which embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as an audio player application, a video player application, a web browser application, social platform software, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal apparatuses 101, 102, 103 are hardware, various electronic apparatuses are possible. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background audio and video resource server providing support for audio and video played on the terminal devices 101, 102, 103. The background audio and video resource server can send audio and video to the terminal equipment and can also receive the audio and video sent by the terminal equipment.
It should be noted that the method for processing audio provided by the embodiment of the present disclosure is generally performed by the terminal devices 101, 102, 103, and accordingly, the apparatus for processing audio is generally disposed in the terminal devices 101, 102, 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case that the audio to be dubbed does not need to be acquired from a remote place, the system architecture may not include a network and a server, but only include a terminal device.
With continued reference to fig. 2, a flow 200 of one embodiment of a method for processing audio in accordance with the present disclosure is shown. The method for processing audio comprises the following steps:
step 201, in response to detecting a dubbing start signal triggered by a user, adjusting the volume of a dubbing track on the audio to be dubbed to a first target volume, and adjusting the volume of other tracks included in the audio to be dubbed and excluding the dubbing track to a second target volume.
In this embodiment, an execution subject of the method for processing audio (e.g., the terminal device shown in fig. 1) may adjust the volume of a dubbing track on the audio to be dubbed to a first target volume and adjust the volume of other tracks included in the audio to be dubbed, except for the dubbing track, to a second target volume in response to detecting a start dubbing signal triggered by a user. The dubbing audio track is an audio track with preset volume, which is added to the audio to be dubbed in advance. For example, the audio to be dubbed may be audio selected by a user from a preset audio set (for example, an audio set locally stored by the execution main body), and when the execution main body detects that the audio to be dubbed is to be dubbed (for example, the user selects the audio to be dubbed and clicks a "dubbing" button), a new audio track is added to the audio to be dubbed as a dubbing audio track. In general, the preset volume may be set to 0.
The audio to be dubbed may be the audio acquired by the execution main body from a remote place in advance through a wired connection manner or a wireless connection manner, or the audio acquired from a local place. It should be understood that the audio to be dubbed may be a separate audio file or may be an audio component included in a video file.
The dubbing start signal may be a signal that is triggered by the user and indicates the start of dubbing the audio to be dubbed. As an example, when the user clicks a start dubbing button displayed on the screen of the execution main body described above, a start dubbing signal is generated.
The first target volume and the second target volume may be volumes when the dubbed audio to be dubbed is played after the dubbing is finished.
In some optional implementations of this embodiment, the first target volume and the second target volume may be preset volumes, respectively. Or respectively the volume adjusted by the user. As an example, the execution body described above may present an interface for adjusting the volume to the user, and the user may adjust the first target volume and the second target volume used during dubbing at the interface.
It should be noted that the first target volume and the second target volume may be set fixed volumes or volumes determined according to set percentages. For example, assuming that the original volume of the audio to be dubbed is 100%, the first target volume may be set to 80% of the original volume, and the second target volume may be set to 20% of the original volume. By setting the first target volume and the second target volume, the user can more flexibly fuse the dubbing audio track and the audio to be dubbed, thereby being beneficial to improving the dubbing effect.
Step 202, obtaining an audio signal to be recorded, and recording the audio signal to be recorded on a dubbing audio track.
In this embodiment, the execution main body (for example, the terminal device shown in fig. 1) may acquire an audio signal to be recorded, and record the audio signal to be recorded on the dubbing track. The audio signal to be recorded may be an audio signal for recording in a dubbing track. As an example, the audio signal to be recorded may be a pre-stored audio signal acquired by the execution main body from a remote place or from a local place as described above. Alternatively, the audio signal to be recorded may be an audio signal collected by the target sound collecting apparatus in real time. The target sound collection device may be a device (e.g., a microphone) included in the execution body, or may be a device communicatively connected to the execution body.
It should be noted that the adding of the dubbing track to the audio to be dubbed, which is described in step 201, and the recording of the audio signal to be recorded on the dubbing track, which is described in step 202, are well-known technologies that are widely researched and applied at present, and are not described herein again.
Step 203, in response to detecting the ending dubbing signal, storing the audio signal to be recorded on the dubbing audio track in the first target volume during the dubbing time period, and storing the audio signal to be recorded on other audio tracks except the dubbing audio track included in the audio to be dubbed during the dubbing time period in the second target volume.
In this embodiment, in response to detecting the ending dubbing signal, the executing body may store the audio signal to be recorded on the dubbing track at the first target volume during the dubbing period, and store the audio signal to be recorded on the other tracks except the dubbing track included in the audio to be dubbed during the dubbing period at the second target volume. The dubbing time period is a time period of an audio signal to be recorded on a dubbing audio track during playing. As an example, when the audio signal to be recorded is an audio signal collected by the target sound collecting apparatus in real time, the dubbing period may be a period from the time when the start dubbing signal is detected to the time when the end dubbing signal is detected. When the audio signal to be recorded is a pre-stored audio signal, the dubbing time period may be a time period starting from the time when the dubbing start signal is detected and having a duration of the play time of the audio signal to be recorded.
The dubbing end signal may be a signal for instructing an operation of ending dubbing triggered by a user, or may be a signal for instructing an operation of ending dubbing automatically generated by the execution main body. As an example, if the audio signal to be recorded is an audio signal collected by the target sound collecting apparatus in real time, when the user clicks an end dubbing button displayed on the screen of the execution main body described above, an end dubbing signal is generated. Or, when detecting that the finger of the user presses the start dubbing button displayed on the screen of the execution main body, generating a start dubbing signal, and when detecting that the finger of the user leaves the screen, generating an end dubbing signal. As another example, if the audio signal to be recorded is a pre-stored audio signal, when the execution main body detects that the audio signal to be recorded is completely recorded on the dubbing track, a dubbing ending signal is generated.
In some optional implementation manners of this embodiment, after step 203, the executing main body may further adjust the volume of the dubbing track to a preset volume, and adjust the volume of other tracks, except for the dubbing track, included in the audio to be dubbed to an initial volume. The initial volume is the volume of the audio to be dubbed before the audio is recorded on the dubbing track. Therefore, after the dubbing is finished, when the audio to be dubbed is played, the volume of the dubbing audio track does not influence the playing of the audio.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing audio according to the present embodiment. In the application scenario of fig. 3, the terminal device 301 is playing the audio 302 to be dubbed, and the user wants to dub the audio 302 to be dubbed. To dub the audio to be dubbed 302, the terminal apparatus 301 adds a dubbing track 3021 with a volume of zero to the audio to be dubbed 302 in advance. At this time, the user presses a start dubbing button 303 on the screen of the terminal apparatus 301, and the terminal apparatus 301 generates a start dubbing signal. When the terminal device 301 detects a start dubbing signal, the volume of the dubbing track is adjusted to the first target volume and the volume of other tracks, except the dubbing track, included in the to-be-dubbed audio is adjusted to the second target volume according to a first target volume (for example, 80% of the original volume of the to-be-dubbed audio) and a second target volume (for example, 20% of the original volume of the to-be-dubbed audio) preset by the user. Then, the microphone on the terminal apparatus 301 collects the voice of the user, generates the audio signal to be recorded 304, and records the audio signal to be recorded 304 on the dubbing audio track 3021. When the finger of the user lifts from the start dubbing button 303, the terminal device 301 generates a dubbing end signal, the terminal device 301 saves the audio signal to be recorded on the dubbing track in the dubbing time period at the first target volume, and saves the audio signal to be recorded on the other tracks except the dubbing track included in the audio to be dubbed in the dubbing time period at the second target volume, thereby obtaining a dubbed audio 305 after dubbing the audio to be dubbed.
The method provided by the above embodiment of the present disclosure adjusts the volume of the dubbing track on the audio to be dubbed to a first target volume in response to detecting a start dubbing signal triggered by a user, adjusts the volume of other tracks except the dubbing track included in the audio to be dubbed to a second target volume, acquires the audio signal to be recorded, and records the audio signal to be recorded on the dubbing track, and stores the audio signal to be recorded on the dubbing track in the dubbing time period at the first target volume in response to detecting an end dubbing signal, and stores the audio signal to be recorded on the other tracks except the dubbing track included in the audio to be dubbed in the dubbing time period at the second target volume, so that dubbing can be performed without modifying the original audio to be dubbed by means of adding tracks to the audio to be dubbed, the first target volume and the second target volume are set, so that the recorded audio signal to be recorded and the original audio to be dubbed can be better fused, and flexible dubbing and modification of the audio to be dubbed are facilitated.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for processing audio is shown. The flow 400 of the method for processing audio includes the steps of:
step 401, in response to detecting a dubbing start signal triggered by a user, adjusting the volume of a dubbing track on the audio to be dubbed to a first target volume, and adjusting the volume of other tracks included in the audio to be dubbed and excluding the dubbing track to a second target volume.
In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.
Step 402, obtaining an audio signal to be recorded, and recording the audio signal to be recorded on a dubbing track.
In this embodiment, step 402 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 403, in response to detecting the ending dubbing signal, saving the audio signal to be recorded on the dubbing track in the dubbing time period at the first target volume, and saving the audio signal to be recorded on other tracks except the dubbing track included in the audio to be dubbed in the dubbing time period at the second target volume.
In this embodiment, step 403 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.
In response to detecting the user-triggered modification dubbing signal, presenting an interface for performing a modification operation on the audio signal to be recorded on the dubbing audio track, step 404.
In this embodiment, an executing body (for example, the terminal device shown in fig. 1) of the method for processing audio may present an interface for performing a modification operation on an audio signal to be recorded on a dubbing track in response to detecting a modification dubbing signal triggered by a user.
The modification dubbing signal may be a user-triggered signal indicating that the user wants to modify the dubbing signal on the saved dubbing track. As an example, when the user clicks a modify dubbing button displayed on the screen of the execution body described above, a modify dubbing signal is generated. Then, an interface for performing modification operation on the audio signal to be recorded on the dubbing track is displayed on the screen, and a user can control the execution main body to perform modification operation on the audio signal to be recorded on the dubbing track by using the interface.
In some optional implementations of this embodiment, the modifying operation may include at least one of: deleting operation, cutting operation and re-recording operation. Wherein the deletion operation may be used to delete the audio signal to be recorded on the dubbing track. The cropping operation may be used to delete portions of the audio signal to be recorded on the dubbing track. The re-recording operation may be used to replace the audio signal to be recorded on the dubbing track with a re-recorded audio signal to be recorded.
Step 405, in response to detecting the user-triggered end of modifying the dubbing signal, saving the modified audio signal to be recorded on the dubbing audio track.
In this embodiment, the executing entity may save the modified audio signal to be recorded on the dubbing track in response to detecting the end-of-dubbing-signal triggered by the user. Wherein the end modification dubbing signal may be a user-triggered signal indicating that the user has completed modifying the dubbing signal on the saved dubbing track. As an example, when the user clicks the end modification dubbing button displayed on the screen of the execution body described above, an end modification dubbing signal is generated. Then, the execution main body saves the dubbing track on which the modification operation is performed.
As can be seen from fig. 4, the flow 400 of the method for processing audio in the present embodiment highlights the step of modifying the dubbing track compared to the corresponding embodiment of fig. 2. Therefore, the scheme described in the embodiment can flexibly modify the dubbed audio to be dubbed which is dubbed without affecting the original audio to be dubbed, thereby further improving the flexibility of dubbing.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing audio, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.
As shown in fig. 5, the apparatus 500 for processing audio of the present embodiment includes: an adjusting unit 501 configured to adjust, in response to detection of a start dubbing signal triggered by a user, a volume of a dubbing track on an audio to be dubbed to a first target volume, and adjust a volume of other tracks included in the audio to be dubbed except the dubbing track to a second target volume, where the dubbing track is a track of a preset volume that is added to the audio to be dubbed in advance; a recording unit 502 configured to acquire an audio signal to be recorded and record the audio signal to be recorded on a dubbing track; a holding unit 503 configured to hold the audio signal to be recorded on the dubbing track at a first target volume for a dubbing period and hold the audio signal to be recorded on a track other than the dubbing track included in the audio to be dubbed in the dubbing period at a second target volume in response to detection of the ending dubbing signal.
In this embodiment, the adjusting unit 501 may adjust the volume of a dubbing track on the audio to be dubbed to a first target volume and adjust the volume of other tracks included in the audio to be dubbed except the dubbing track to a second target volume in response to detection of a start dubbing signal triggered by the user. The dubbing audio track is an audio track with preset volume, which is added to the audio to be dubbed in advance. In general, the preset volume may be set to 0. The audio to be dubbed may be the audio previously acquired by the apparatus 500 from a remote place by a wired connection or a wireless connection, or the audio acquired from a local place.
It should be understood that the audio to be dubbed may be a separate audio file or may be an audio component included in a video file.
The dubbing start signal may be a signal that is triggered by the user and indicates the start of dubbing the audio to be dubbed. As an example, when the user clicks a start dubbing button displayed on the screen of the above-described apparatus 500, a start dubbing signal is generated.
The first target volume and the second target volume may be volumes when the dubbed audio to be dubbed is played after the dubbing is finished.
In this embodiment, the recording unit 502 may obtain an audio signal to be recorded and record the audio signal to be recorded on the dubbing track. The audio signal to be recorded may be an audio signal for recording in a dubbing track. As an example, the audio signal to be recorded may be a pre-stored audio signal acquired by the apparatus 500 from a remote location or from a local location. Alternatively, the audio signal to be recorded may be an audio signal collected by the target sound collecting apparatus in real time. The target sound collection device may be a device (e.g., a microphone) included in the device 500, or may be a device communicatively connected to the device 500.
In this embodiment, the holding unit 503 may hold the audio signal to be recorded on the dubbing track at the first target volume for the dubbing period and hold the audio signal to be recorded on the other track than the dubbing track included in the audio to be dubbed for the dubbing period at the second target volume in response to detection of the end dubbing signal.
The dubbing time period is a time period of an audio signal to be recorded on a dubbing audio track during playing. As an example, when the audio signal to be recorded is an audio signal collected by the target sound collecting apparatus in real time, the dubbing period may be a period from the time when the start dubbing signal is detected to the time when the end dubbing signal is detected. When the audio signal to be recorded is a pre-stored audio signal, the dubbing time period may be a time period starting from the time when the dubbing start signal is detected and having a duration of the play time of the audio signal to be recorded.
The dubbing end signal may be a signal for instructing an operation of ending dubbing triggered by a user, or may be a signal for instructing an operation of ending dubbing automatically generated by the apparatus 500. As an example, if the audio signal to be recorded is an audio signal collected by the target sound collection apparatus in real time, when the user clicks an end dubbing button displayed on the screen of the apparatus 500 described above, an end dubbing signal is generated. Alternatively, the start dubbing signal may be generated when it is detected that the user's finger has pressed a start dubbing button displayed on the screen of the apparatus 500, and the end dubbing signal may be generated when it is detected that the user's finger has left the screen. As another example, if the audio signal to be recorded is a pre-stored audio signal, when the apparatus 500 detects that the audio signal to be recorded is completely recorded on the dubbing track, a dubbing ending signal is generated.
In some optional implementations of this embodiment, the saving unit 503 may be further configured to: and adjusting the volume of the dubbing audio track to a preset volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to an initial volume.
In some optional implementations of this embodiment, the saving unit 503 may include: a presentation module (not shown in the figures) configured to present, in response to detecting a user-triggered modifying dubbing signal, an interface for performing a modifying operation on an audio signal to be recorded on a dubbing track; a saving module (not shown in the figures) configured to save the modified audio signal to be recorded on the dubbing track in response to detecting the user-triggered end modification of the dubbing signal.
In some optional implementations of this embodiment, the modifying operation includes at least one of: deleting operation, cutting operation and re-recording operation.
In some optional implementations of this embodiment, the first target volume and the second target volume are respectively preset volumes or respectively volumes adjusted by a user.
The apparatus provided in the foregoing embodiment of the present disclosure adjusts, in response to detecting a start dubbing signal triggered by a user, a volume of a dubbing track on an audio to be dubbed to a first target volume, adjusts a volume of other tracks, except the dubbing track, included in the audio to be dubbed to a second target volume, acquires an audio signal to be recorded, and records the audio signal to be recorded on the dubbing track, and in response to detecting an end dubbing signal, saves, at the first target volume, the audio signal to be recorded on the dubbing track in a dubbing time period, and saves, at the second target volume, the audio signal to be recorded on the other tracks, included in the audio to be dubbed in the dubbing time period, except the dubbing track, so that dubbing can be performed without modifying an original audio to be dubbed by adding a track to the audio to be dubbed, the method is beneficial to flexibly dubbing the audio to be dubbed and modifying the dubbing.
Referring now to fig. 6, shown is a schematic block diagram of a terminal device 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.
As shown in fig. 6, the terminal device 600 may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 601 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the terminal apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the terminal device 600 to perform wireless or wired communication with other devices to exchange data. While fig. 6 illustrates a terminal apparatus 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the terminal device, cause the terminal device to: in response to the detection of a dubbing starting signal triggered by a user, adjusting the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume, wherein the dubbing audio track is an audio track which is added to the audio to be dubbed in advance and has a preset volume; acquiring an audio signal to be recorded, and recording the audio signal to be recorded on a dubbing audio track; and in response to the detection of the ending dubbing signal, storing the audio signal to be recorded on the dubbing audio track in the dubbing time period at the first target volume, and storing the audio signal to be recorded on other audio tracks except the dubbing audio track, which are included in the audio to be dubbed in the dubbing time period, at the second target volume.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation on the unit itself, for example, a recording unit may also be described as a "unit of a recording unit".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (12)

1. A method for processing audio, comprising:
in response to the detection of a starting dubbing signal triggered by a user, adjusting the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume, wherein the dubbing audio track is an audio track which is added to the audio to be dubbed in advance and has a preset volume;
acquiring an audio signal to be recorded, and recording the audio signal to be recorded on the dubbing audio track;
in response to the detection of the ending dubbing signal, the audio signal to be recorded on the dubbing audio track in the dubbing time period is saved at the first target volume, and the audio signal to be recorded on other audio tracks except the dubbing audio track included in the audio to be dubbed in the dubbing time period is saved at the second target volume.
2. The method of claim 1, wherein after said saving the audio signal to be recorded on a track other than said dubbing track comprised by said audio to be dubbed for said dubbing period at said second target volume, the method further comprises:
and adjusting the volume of the dubbing audio track to the preset volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to the initial volume.
3. The method of claim 1, wherein after said saving the audio signal to be recorded on a track other than said dubbing track comprised by said audio to be dubbed for said dubbing period at said second target volume, the method further comprises:
in response to detecting a modification dubbing signal triggered by a user, displaying an interface for modifying an audio signal to be recorded on the dubbing audio track;
and in response to detecting the end of the user trigger, modifying the dubbing signal, and saving the modified audio signal to be recorded on the dubbing audio track.
4. The method of claim 3, wherein the modifying operation comprises at least one of: deleting operation, cutting operation and re-recording operation.
5. The method according to one of claims 1 to 4, wherein the first target volume and the second target volume are each a preset volume or a volume adjusted by the user.
6. An apparatus for processing audio, comprising:
the adjusting unit is configured to respond to the detection of a starting dubbing signal triggered by a user, adjust the volume of a dubbing audio track on the audio to be dubbed to a first target volume, and adjust the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to a second target volume, wherein the dubbing audio track is an audio track which is added to the audio to be dubbed in advance and has a preset volume;
a recording unit configured to
Acquiring an audio signal to be recorded, and recording the audio signal to be recorded on the dubbing audio track;
a holding unit configured to hold an audio signal to be recorded on the dubbing track for a dubbing period at the first target volume and hold an audio signal to be recorded on a track other than the dubbing track included in the audio to be dubbed for the dubbing period at the second target volume in response to detection of an end dubbing signal.
7. The apparatus of claim 6, wherein the saving unit is further configured to:
and adjusting the volume of the dubbing audio track to the preset volume, and adjusting the volume of other audio tracks except the dubbing audio track included in the audio to be dubbed to the initial volume.
8. The apparatus of claim 6, wherein the saving unit comprises:
a presentation module configured to present, in response to detecting a user-triggered modification dubbing signal, an interface for performing a modification operation on an audio signal to be recorded on the dubbing audio track;
a saving module configured to save the modified audio signal to be recorded on the dubbing track in response to detecting a user-triggered end modification dubbing signal.
9. The apparatus of claim 8, wherein the modification operation comprises at least one of: deleting operation, cutting operation and re-recording operation.
10. The apparatus according to one of claims 6 to 9, wherein the first target volume and the second target volume are respectively a preset volume or respectively a volume adjusted by the user.
11. A terminal device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201910037108.0A 2019-01-15 2019-01-15 Method and apparatus for processing audio Active CN111435600B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910037108.0A CN111435600B (en) 2019-01-15 2019-01-15 Method and apparatus for processing audio
PCT/CN2019/127603 WO2020147522A1 (en) 2019-01-15 2019-12-23 Method and device for processing audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910037108.0A CN111435600B (en) 2019-01-15 2019-01-15 Method and apparatus for processing audio

Publications (2)

Publication Number Publication Date
CN111435600A CN111435600A (en) 2020-07-21
CN111435600B true CN111435600B (en) 2021-05-18

Family

ID=71580079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910037108.0A Active CN111435600B (en) 2019-01-15 2019-01-15 Method and apparatus for processing audio

Country Status (2)

Country Link
CN (1) CN111435600B (en)
WO (1) WO2020147522A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000308B (en) * 2020-09-10 2023-04-18 成都拟合未来科技有限公司 Double-track audio playing control method, system, terminal and medium
CN112954390B (en) * 2021-01-26 2023-05-09 北京有竹居网络技术有限公司 Video processing method, device, storage medium and equipment
CN113421577A (en) * 2021-05-10 2021-09-21 北京达佳互联信息技术有限公司 Video dubbing method and device, electronic equipment and storage medium
CN116737104A (en) * 2022-09-16 2023-09-12 荣耀终端有限公司 Volume adjusting method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103474055A (en) * 2012-08-06 2013-12-25 苏州沃通信息科技有限公司 Mobile phone KTV solution
CN104657074A (en) * 2015-01-27 2015-05-27 中兴通讯股份有限公司 Method, device and mobile terminal for realizing sound recording
CN204795456U (en) * 2015-07-29 2015-11-18 王泰来 Dual track audio playback device
CN105336348A (en) * 2015-11-16 2016-02-17 合一网络技术(北京)有限公司 Processing system and method for multiple audio tracks in video editing
CN105359214A (en) * 2013-05-03 2016-02-24 石哲 Method for producing media contents in duet mode and apparatus used therein
CN106952642A (en) * 2016-01-06 2017-07-14 广州酷狗计算机科技有限公司 The method and apparatus of audio synthesis

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7734364B2 (en) * 2005-03-08 2010-06-08 Lolo, Llc Mixing media files
US20100319518A1 (en) * 2009-06-23 2010-12-23 Virendra Kumar Mehta Systems and methods for collaborative music generation
US9300268B2 (en) * 2013-10-18 2016-03-29 Apple Inc. Content aware audio ducking

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103474055A (en) * 2012-08-06 2013-12-25 苏州沃通信息科技有限公司 Mobile phone KTV solution
CN105359214A (en) * 2013-05-03 2016-02-24 石哲 Method for producing media contents in duet mode and apparatus used therein
CN104657074A (en) * 2015-01-27 2015-05-27 中兴通讯股份有限公司 Method, device and mobile terminal for realizing sound recording
CN204795456U (en) * 2015-07-29 2015-11-18 王泰来 Dual track audio playback device
CN105336348A (en) * 2015-11-16 2016-02-17 合一网络技术(北京)有限公司 Processing system and method for multiple audio tracks in video editing
CN106952642A (en) * 2016-01-06 2017-07-14 广州酷狗计算机科技有限公司 The method and apparatus of audio synthesis

Also Published As

Publication number Publication date
WO2020147522A1 (en) 2020-07-23
CN111435600A (en) 2020-07-21

Similar Documents

Publication Publication Date Title
CN111435600B (en) Method and apparatus for processing audio
CN112911379B (en) Video generation method, device, electronic equipment and storage medium
US11997358B2 (en) Video processing method and apparatus, device and medium
CN110267113B (en) Video file processing method, system, medium, and electronic device
CN111510760A (en) Video information display method and device, storage medium and electronic equipment
CN111436005B (en) Method and apparatus for displaying image
CN110289024B (en) Audio editing method and device, electronic equipment and storage medium
CN109582274B (en) Volume adjusting method and device, electronic equipment and computer readable storage medium
WO2023051293A1 (en) Audio processing method and apparatus, and electronic device and storage medium
CN110809189A (en) Video playing method and device, electronic equipment and computer readable medium
WO2020233169A1 (en) Comment content display method, apparatus and device, and storage medium
WO2024109706A1 (en) Media content display method and apparatus, and device, readable storage medium and product
CN109492163B (en) List display recording method and device, terminal equipment and storage medium
US20240103802A1 (en) Method, apparatus, device and medium for multimedia processing
US20240054157A1 (en) Song recommendation method and apparatus, electronic device, and storage medium
JP2023525091A (en) Image special effect setting method, image identification method, device and electronic equipment
CN111385599B (en) Video processing method and device
CN110381356B (en) Audio and video generation method and device, electronic equipment and readable medium
CN111045634B (en) Audio processing method and device
CN111669625A (en) Processing method, device and equipment for shot file and storage medium
CN111385638B (en) Video processing method and device
CN109375892B (en) Method and apparatus for playing audio
CN111381796B (en) Processing method and device for realizing KTV function on client and user equipment
CN109445873B (en) Method and device for displaying setting interface
WO2023241283A1 (en) Video editing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder