WO2020147522A1

WO2020147522A1 - Method and device for processing audio

Info

Publication number: WO2020147522A1
Application number: PCT/CN2019/127603
Authority: WO
Inventors: 思磊
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2019-01-15
Filing date: 2019-12-23
Publication date: 2020-07-23
Also published as: CN111435600B; CN111435600A

Abstract

Disclosed are a method and device for processing audio. A specific embodiment of the method comprises: in response to a dubbing start signal triggered by a user being detected, adjusting the volume of a dubbing track on audio to be dubbed to a first target volume, and adjusting the volume of audio tracks, other than the dubbing track, comprised in the audio to be dubbed to a second target volume; acquiring an audio signal to be recorded, and recording the audio signal to be recorded on the dubbing track; and in response to a dubbing end signal being detected, storing, at the first target volume, the audio signal recorded on the dubbing track within a dubbing time period, and storing, at the second target volume, audio signals to be recorded on the audio tracks, other than the dubbing track, comprised in the audio to be dubbed within the dubbing time period. According to the embodiment, dubbing can be performed without modifying original audio to be dubbed, thereby facilitating the flexible dubbing of audio to be dubbed and modifying dubbing.

Description

Method and device for processing audio

Cross-reference of related applications

This application is filed based on the Chinese patent application with the application number 201910037108.0, the filing date of January 15, 2019, and the title "Method and Apparatus for Audio Processing", and it claims the priority of the Chinese patent application. The Chinese patent application The entire content of is hereby incorporated into this application as a reference.

Technical field

The embodiments of the present disclosure relate to the field of computer technology, in particular to methods and devices for processing audio.

Background technique

With the development of Internet technology, people can use electronic devices such as mobile phones and tablet computers to obtain video, audio and other content from Internet resources, as well as record video and audio, and dubbing video and audio. When people dub the original audio, the prior art usually uses a method of directly mixing the user's voice with the original audio, or replacing the user's voice with a certain segment of the original audio.

Summary of the invention

The embodiments of the present disclosure propose methods and apparatuses for processing audio.

In a first aspect, an embodiment of the present disclosure provides a method for processing audio. The method includes: in response to detecting a user-triggered dubbing start signal, adjusting the volume of the dubbing track on the audio to be dubbed to the first Target volume, which adjusts the volume of the audio tracks to be dubbed, other than the dubbed audio track, to the second target volume, where the dubbed audio track is an audio track with a preset volume added to the audio to be dubbed in advance ; Obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing track; in response to detecting the end dubbing signal, save the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period at the first target volume, The second target volume is used to save the to-be-recorded audio signals included in the to-be-dubbed audio in the dubbing time period and on other audio tracks except the dubbing track.

In some embodiments, after saving the to-be-recorded audio signal on other audio tracks other than the dubbing audio track included in the dubbing audio during the dubbing time period at the second target volume, the method further includes: transferring the dubbing audio track Adjust the volume of the audio to the preset volume, and adjust the volume of the audio tracks to be dubbed, other than the dubbed audio track, to the initial volume.

In some embodiments, after saving the to-be-recorded audio signal included in the to-be-dubbed audio during the dubbing time period at the second target volume and on a track other than the dubbing track, the method further includes: responding to detecting User-triggered modified dubbing signal, showing the interface for modifying the to-be-recorded audio signal recorded on the dubbing track; in response to detecting the end of the user-triggered modification of the dubbing signal, save the modified dubbing signal on the dubbing track Audio signal to be recorded.

In some embodiments, the modification operation includes at least one of the following: a deletion operation, a cropping operation, and a re-recording operation.

In some embodiments, the first target volume and the second target volume are respectively preset volume or volume adjusted by the user.

In a second aspect, an embodiment of the present disclosure provides an apparatus for processing audio. The apparatus includes: an adjustment unit configured to, in response to detecting a user-triggered dubbing start signal, adjust the dubbing track on the audio to be dubbed Adjust the volume of the to-be-dubbed audio to the first target volume, and adjust the volume of the audio tracks to be dubbed, except for the dubbing track, to the second target volume, where the dubbing track is added to the to-be-dubbed audio in advance, A sound track with a preset volume; a recording unit configured to obtain the audio signal to be recorded and record the audio signal to be recorded on the dubbing sound track; the saving unit is configured to respond to the detection of the end dubbing signal to the first target The volume saves the to-be-recorded audio signals recorded on the dubbing track during the dubbing time period, and saves the to-be-recorded audio signals on the audio tracks other than the dubbing track included in the dubbing audio during the dubbing time period at the second target volume .

In some embodiments, the saving unit is further configured to: adjust the volume of the dubbed audio track to a preset volume, and adjust the volume of other audio tracks included in the audio to be dubbed except for the dubbed audio track to the initial volume.

In some embodiments, the saving unit includes: a display module configured to display an interface for modifying the audio signal to be recorded recorded on the dubbing audio track in response to detecting a modified dubbing signal triggered by a user; a saving module , Is configured to modify the dubbing signal in response to detecting the end triggered by the user, and save the modified audio signal to be recorded on the dubbing track.

In the third aspect, the embodiments of the present disclosure provide a terminal device, the terminal device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are Multiple processors execute, so that one or more processors implement the method described in any implementation manner of the first aspect.

According to a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, which when executed by a processor implements the method described in any one of the implementation manners of the first aspect.

The method and apparatus for audio processing provided by the embodiments of the present disclosure adjust the volume of the dubbing track on the audio to be dubbed to the first target volume in response to detecting the start dubbing signal triggered by the user, and adjust the audio to be dubbed The volume of the included audio tracks other than the dubbing track is adjusted to the second target volume, the audio signal to be recorded is acquired, and the audio signal to be recorded is recorded on the dubbing track, in response to detecting the end dubbing signal, Save the to-be-recorded audio signal recorded on the dubbing track during the dubbing period at the first target volume, and save the audio signal to be dubbed during the dubbing period at the second target volume on other audio tracks except the dubbing track The audio signal to be recorded can be used to add audio tracks to the audio to be dubbed, so that dubbing can be performed without modifying the original audio to be dubbed. Setting the first target volume and the second target volume can help To better integrate the recorded audio signal to be recorded with the original audio to be dubbed, it helps to flexibly dub and modify the dubbed audio.

BRIEF DESCRIPTION

By reading the detailed description of the non-limiting embodiments with reference to the following drawings, other features, purposes, and advantages of the present disclosure will become more apparent:

Fig. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied;

Fig. 2 is a flowchart of one embodiment of a method for processing audio according to an embodiment of the present disclosure;

Fig. 3 is a schematic diagram of an application scenario of the method for processing audio according to an embodiment of the present disclosure;

Fig. 4 is a flowchart of another embodiment of a method for processing audio according to an embodiment of the present disclosure;

Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for processing audio according to an embodiment of the present disclosure;

Fig. 6 is a schematic structural diagram of a terminal device suitable for implementing embodiments of the present disclosure.

detailed description

The disclosure will be further described in detail below with reference to the drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the relevant disclosure, rather than limiting the disclosure. It should also be noted that, for ease of description, only the parts related to the relevant disclosure are shown in the drawings.

It should be noted that the embodiments in the present disclosure and the features in the embodiments can be combined with each other without conflict. The disclosure will be described in detail below with reference to the drawings and in conjunction with the embodiments.

FIG. 1 shows an exemplary system architecture 100 of a method for processing audio or an apparatus for processing audio to which an embodiment of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on. Various communication client applications, such as audio player applications, video player applications, web browser applications, and social platform software, may be installed on the

terminal devices

101, 102, and 103.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices. When the

terminal devices

101, 102, and 103 are software, they can be installed in the aforementioned electronic devices. It can be implemented as multiple software or software modules (for example, software or software modules used to provide distributed services), or as a single software or software module. There is no specific limit here.

The server 105 may be a server that provides various services, such as a background audio and video resource server that provides support for audio and video played on the

terminal devices

101, 102, and 103. The background audio and video resource server can send audio and video to the terminal device, and can also receive the audio and video sent by the terminal device.

It should be noted that the method for processing audio provided by the embodiments of the present disclosure is generally executed by the

terminal devices

101, 102, 103, and correspondingly, the device for processing audio is generally set in the

terminal devices

101, 102, 103 .

It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, software or software modules for providing distributed services), or as a single software or software module. There is no specific limit here.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are only schematic. According to the implementation needs, there can be any number of terminal devices, networks and servers. In the case that the audio to be dubbed does not need to be obtained remotely, the above system architecture may not include the network and the server, but only the terminal device.

With continued reference to FIG. 2, there is shown a flow 200 of an embodiment of the method for processing audio according to the present disclosure. The method for processing audio includes the following steps:

Step 201: In response to detecting a user-triggered dubbing start signal, adjust the volume of the dubbing track on the audio to be dubbed to the first target volume, and adjust the audio to be dubbed to include audio tracks other than the dubbing track. The volume is adjusted to the second target volume.

In this embodiment, the executor of the method for processing audio (for example, the terminal device shown in FIG. 1) may adjust the volume of the dubbing track on the audio to be dubbed to be in response to detecting the start dubbing signal triggered by the user. The first target volume is to adjust the volume of audio tracks other than the dubbed audio track included in the audio to be dubbed to the second target volume. Among them, the dubbing audio track is an audio track with a preset volume added to the audio to be dubbed in advance. As an example, the audio to be dubbed may be the audio selected by the user from a preset audio set (for example, the audio set stored locally by the above-mentioned executive body). When the above-mentioned executive body detects that the audio to be dubbed is intended to be dubbed ( For example, the user selects the audio to be dubbed and clicks the "dubbing" button) to add a new audio track as the dubbing audio track to the audio to be dubbed. Usually, the preset volume can be set to 0.

The audio to be dubbed may be the audio obtained by the above-mentioned executive body from a remote location through a wired connection or a wireless connection in advance, or an audio obtained locally. It should be understood that the audio to be dubbed may be a separate audio file or an audio component included in the video file.

The above-mentioned dubbing start signal may be a signal triggered by the user and used to indicate the start of dubbing the dubbed audio. As an example, when the user clicks the start dubbing button displayed on the screen of the above-mentioned execution subject, a start dubbing signal is generated.

The above-mentioned first target volume and second target volume may be the volume when the dubbed audio to be dubbed is played after the dubbing ends.

In some optional implementation manners of this embodiment, the first target volume and the second target volume may be preset volume respectively. It can also be the volume adjusted by the user. As an example, the above-mentioned execution subject may show the user an interface for adjusting the volume, and the user may adjust the first target volume and the second target volume used during dubbing on the interface.

It should be noted that the first target volume and the second target volume may be a set fixed volume, or may be a volume determined according to a set percentage. For example, assuming that the original volume of the audio to be dubbed is 100%, the first target volume may be set to 80% of the foregoing original volume, and the second target volume may be set to 20% of the foregoing original volume. By setting the first target volume and the second target volume, the user can more flexibly blend the dubbed audio track with the audio to be dubbed, thereby helping to improve the effect of dubbing.

Step 202: Obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing audio track.

In this embodiment, the above-mentioned execution subject (for example, the terminal device shown in FIG. 1) can obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing audio track. Among them, the audio signal to be recorded may be an audio signal for recording on a dubbing audio track. As an example, the audio signal to be recorded may be a pre-stored audio signal obtained remotely or locally by the above-mentioned executive body. Alternatively, the audio signal to be recorded may be an audio signal collected in real time by the target sound collecting device. Wherein, the target sound collection device may be a device (such as a microphone) included in the execution subject, or a device communicatively connected with the execution subject.

It should be noted that the method of adding a dubbing track to the audio to be dubbed described in step 201 and the method of recording the to-be-recorded audio signal on the dubbing track described in step 202 are well-known technologies that are currently widely studied and applied. I won't repeat them here.

Step 203: In response to detecting the end dubbing signal, save the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period at the first target volume, and save the to-be-recorded audio signal during the dubbing time period at the second target volume. To-be-recorded audio signals on audio tracks other than the dubbed audio track.

In this embodiment, in response to detecting the end dubbing signal, the above-mentioned execution subject may save the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period at the first target volume, and save the dubbing time period at the second target volume The audio to be dubbed includes audio signals to be recorded on other audio tracks except the dubbed audio track. Among them, the dubbing time period is the time period during which the to-be-recorded audio signal recorded on the dubbing audio track is played. As an example, when the audio signal to be recorded is an audio signal collected in real time by the target sound collecting device, the dubbing time period may be a time period from the time when the dubbing signal is detected to the time when the dubbing signal is detected to end. When the audio signal to be recorded is a pre-stored audio signal, the dubbing time period may be a time period starting from the detection of the start of the dubbing signal, and the duration is a time period of the playing time of the audio signal to be recorded.

The above-mentioned dubbing end signal may be a signal triggered by a user and used to instruct the end of dubbing operation, or it may be a signal automatically generated by the execution subject and used to instruct the end of dubbing operation. As an example, if the audio signal to be recorded is an audio signal collected in real time by the target sound collecting device, when the user clicks the end dubbing button displayed on the screen of the execution subject, the end dubbing signal is generated. Alternatively, when it is detected that the user's finger presses the start dubbing button displayed on the screen of the execution subject, a start dubbing signal is generated, and when it is detected that the user's finger leaves the screen, an end dubbing signal is generated. As another example, if the above-mentioned audio signal to be recorded is a pre-stored audio signal, when the above-mentioned executive body detects that the audio signal to be recorded is completely recorded on the dubbing track, it generates an end dubbing signal.

In some optional implementations of this embodiment, after step 203, the above-mentioned executive body may also adjust the volume of the dubbing audio track to a preset volume, and adjust the audio to be dubbed to include other than the dubbing audio track. The volume of the audio track is adjusted to the initial volume. Among them, the initial volume is the volume of the audio to be dubbed before the audio is recorded on the dubbing track. Therefore, after dubbing is finished, when the aforementioned audio to be dubbed is played, the volume of the dubbing track does not affect the playback of the audio.

Continue to refer to FIG. 3, which is a schematic diagram of an application scenario of the method for processing audio according to the present embodiment. In the application scenario of FIG. 3, the terminal device 301 is playing the audio 302 to be dubbed, and the user wants to dub the audio 302 to be dubbed. In order to dub the audio 302 to be dubbed, the terminal device 301 adds a dubbing track 3021 with a volume of zero to the audio 302 to be dubbed in advance. At this time, the user presses the start dubbing button 303 on the screen of the terminal device 301, and the terminal device 301 generates a start dubbing signal. When the terminal device 301 detects the start of dubbing signal, according to the first target volume (for example, 80% of the original volume of the audio to be dubbed) and the second target volume (for example, 20% of the original volume of the audio to be dubbed) preset by the user, The volume of the dubbing audio track is adjusted to the first target volume, and the volume of the audio tracks other than the dubbing audio track included in the audio to be dubbed is adjusted to the second target volume. Then, the microphone on the terminal device 301 collects the user's voice, generates an audio signal 304 to be recorded, and records the audio signal 304 to be recorded on the dubbing audio track 3021. When the user’s finger is lifted from the dubbing start button 303, the terminal device 301 generates an end dubbing signal, and the terminal device 301 saves the audio signal to be recorded recorded on the dubbing track during the dubbing period at the first target volume, and the second The target volume stores the to-be-dubbed audio signals included in the to-be-dubbed audio in the dubbing time period, and the to-be-recorded audio signals on other audio tracks except the dubbed audio track, so as to obtain the dubbed audio 305 after dubbing the to-be dubbed audio.

The method provided by the above-mentioned embodiment of the present disclosure adjusts the volume of the dubbing track on the audio to be dubbed to the first target volume in response to the detection of a user-triggered dubbing start signal, and removes the dubbing audio included in the audio to be dubbed. The volume of other audio tracks other than the audio track is adjusted to the second target volume, and then the audio signal to be recorded is acquired, and the audio signal to be recorded is recorded on the dubbing track, and in response to detecting the end dubbing signal, it is saved at the first target volume The to-be-recorded audio signal recorded on the dubbing track during the dubbing time period, and the to-be-recorded audio signal on the other audio tracks except the dubbing track included in the dubbing audio during the dubbing time period is saved at the second target volume, thereby It is possible to add audio tracks to the audio to be dubbed, so that the original audio to be dubbed can be dubbed without modifying the original audio to be dubbed. By setting the first target volume and the second target volume, it can help to record the audio to be recorded. The signal is better integrated with the original audio to be dubbed, which helps to flexibly dub and modify the dubbed audio.

With further reference to FIG. 4, it shows a flow 400 of still another embodiment of a method for processing audio. The process 400 of the method for processing audio includes the following steps:

Step 401: In response to detecting a user-triggered dubbing start signal, adjust the volume of the dubbing track on the audio to be dubbed to the first target volume, and adjust the audio to be dubbed to include audio tracks other than the dubbing track. The volume is adjusted to the second target volume.

In this embodiment, step 401 is basically the same as step 201 in the embodiment corresponding to FIG. 2, and will not be repeated here.

Step 402: Obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing audio track.

In this embodiment, step 402 is basically the same as step 202 in the embodiment corresponding to FIG. 2, and will not be repeated here.

Step 403: In response to detecting the end dubbing signal, save the to-be-recorded audio signal recorded on the dubbing track during the dubbing period at the first target volume, and save the to-be-recorded audio signal during the dubbing period at the second target volume. To-be-recorded audio signals on audio tracks other than the dubbed audio track.

In this embodiment, step 403 is basically the same as step 203 in the embodiment corresponding to FIG. 2, and will not be repeated here.

Step 404: In response to detecting the modified dubbing signal triggered by the user, an interface for modifying the to-be-recorded audio signal recorded on the dubbing track is displayed.

In this embodiment, the executor of the method for processing audio (for example, the terminal device shown in FIG. 1) may respond to detecting the modified dubbing signal triggered by the user, and display the dubbing signal to be recorded on the dubbing track. The interface for modifying the audio signal.

The above-mentioned modified dubbing signal may be a signal triggered by the user and used to indicate that the user wants to modify the dubbing signal on the saved dubbing track. As an example, when the user clicks the modify dubbing button displayed on the screen of the execution subject, a modified dubbing signal is generated. Then, an interface for modifying the to-be-recorded audio signal recorded on the dubbing track is displayed on the screen, and the user can use this interface to control the above-mentioned execution subject to modify the to-be-recorded audio signal on the dubbing track.

In some optional implementation manners of this embodiment, the modification operation may include at least one of the following: a deletion operation, a cropping operation, and a re-recording operation. Among them, the delete operation can be used to delete the to-be-recorded audio signal on the dubbing audio track. The cropping operation can be used to delete part of the audio signal to be recorded on the dubbing audio track. The re-recording operation can be used to replace the to-be-recorded audio signal on the dubbing track with the re-recorded-to-be-recorded audio signal.

Step 405: In response to detecting the end of the modified dubbing signal triggered by the user, save the modified audio signal to be recorded on the dubbing track.

In this embodiment, the above-mentioned execution subject may modify the dubbing signal in response to detecting the end triggered by the user, and save the modified audio signal to be recorded on the dubbing track. The above-mentioned end modification dubbing signal may be a signal triggered by the user and used to indicate that the user has completed the modification of the dubbing signal on the saved dubbing audio track. As an example, when the user clicks the end modification dubbing button displayed on the screen of the execution subject, the end modification dubbing signal is generated. Then, the above-mentioned execution subject saves the dubbing audio track after the modification operation has been performed.

It can be seen from FIG. 4 that compared with the embodiment corresponding to FIG. 2, the process 400 of the method for processing audio in this embodiment highlights the step of modifying the dubbing audio track. Therefore, the solution described in this embodiment can flexibly modify the dubbed audio to be dubbed without affecting the original audio to be dubbed, thereby further improving the flexibility of dubbing.

With further reference to FIG. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a device for processing audio. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.

As shown in FIG. 5, the apparatus 500 for processing audio in this embodiment includes: an adjustment unit 501 configured to adjust the volume of the dubbing track on the audio to be dubbed to in response to detecting the start dubbing signal triggered by the user The first target volume, which adjusts the volume of the audio tracks to be dubbed except for the dubbing audio track to the second target volume, where the dubbing audio track is pre-added to the audio to be dubbed and has a preset volume Audio track; recording unit 502, configured to obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing audio track; saving unit 503, configured to respond to the detection of the end dubbing signal, save at the first target volume The to-be-recorded audio signal recorded on the dubbing track during the dubbing time period is stored at the second target volume and the to-be-recorded audio signal on the other audio tracks except the dubbing track included in the dubbing audio during the dubbing time period is saved.

In this embodiment, the adjustment unit 501 may adjust the volume of the dubbing track on the audio to be dubbed to the first target volume in response to detecting the start dubbing signal triggered by the user, and adjust the audio track to be dubbed except for the dubbing audio track. Adjust the volume of other audio tracks to the second target volume. Among them, the dubbing audio track is an audio track with a preset volume added to the audio to be dubbed in advance. Usually, the preset volume can be set to 0. The audio to be dubbed may be the audio obtained by the foregoing device 500 from a remote location through a wired connection or a wireless connection in advance, or an audio obtained locally.

It should be understood that the audio to be dubbed may be a separate audio file or an audio component included in the video file.

The above-mentioned dubbing start signal may be a signal triggered by the user and used to indicate the start of dubbing the dubbed audio. As an example, when the user clicks the start dubbing button displayed on the screen of the above-mentioned device 500, a start dubbing signal is generated.

In this embodiment, the recording unit 502 can obtain the audio signal to be recorded, and record the audio signal to be recorded on the dubbing soundtrack. Wherein, the audio signal to be recorded may be an audio signal used for recording on a dubbing audio track. As an example, the audio signal to be recorded may be a pre-stored audio signal obtained remotely or locally by the aforementioned apparatus 500. Alternatively, the audio signal to be recorded may be an audio signal collected in real time by the target sound collecting device. The target sound collection device may be a device (such as a microphone) included in the foregoing device 500, or may be a device communicatively connected to the foregoing device 500.

In this embodiment, the saving unit 503 may, in response to detecting the end dubbing signal, save the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period at the first target volume, and save the dubbing time period at the second target volume The audio to be dubbed includes audio signals to be recorded on other audio tracks except the dubbed audio track.

Among them, the dubbing time period is the time period during which the to-be-recorded audio signal recorded on the dubbing audio track is played. As an example, when the audio signal to be recorded is an audio signal collected in real time by the target sound collecting device, the dubbing time period may be a time period from the time when the dubbing signal is detected to the time when the dubbing signal is detected to end. When the audio signal to be recorded is a pre-stored audio signal, the dubbing time period may be a time period starting from the detection of the start of the dubbing signal, and the duration is a time period of the playing time of the audio signal to be recorded.

The above-mentioned dubbing end signal may be a signal triggered by a user and used to instruct the end of dubbing operation, or it may be a signal automatically generated by the above-mentioned apparatus 500 and used to instruct the end of dubbing operation. As an example, if the audio signal to be recorded is an audio signal collected in real time by the target sound collection device, when the user clicks the end dubbing button displayed on the screen of the device 500, an end dubbing signal is generated. Alternatively, when it is detected that the user's finger presses the start dubbing button displayed on the screen of the device 500, a start dubbing signal is generated, and when it is detected that the user's finger leaves the screen, an end dubbing signal is generated. As another example, if the audio signal to be recorded is a pre-stored audio signal, when the device 500 detects that the audio signal to be recorded is completely recorded on the dubbing track, it generates an end dubbing signal.

In some optional implementations of this embodiment, the saving unit 503 may be further configured to: adjust the volume of the dubbing audio track to a preset volume, and adjust the audio to be dubbed to include other audio except the dubbing audio track. The volume of the track is adjusted to the initial volume.

In some alternative implementations of this embodiment, the saving unit 503 may include: a display module (not shown in the figure), configured to display a display module (not shown) in response to detecting a modified dubbing signal triggered by a user to display the The interface for modifying the audio signal to be recorded on the audio track; the save module (not shown in the figure) is configured to modify the dubbing signal in response to detecting the end triggered by the user, and save the modified pending audio signal on the audio track. Record audio signal.

In some optional implementation manners of this embodiment, the modification operation includes at least one of the following: a deletion operation, a cropping operation, and a re-recording operation.

In some optional implementation manners of this embodiment, the first target volume and the second target volume are respectively preset volume or volume adjusted by the user.

The device provided in the above-mentioned embodiment of the present disclosure adjusts the volume of the dubbing track on the audio to be dubbed to the first target volume by responding to the detection of a user-triggered dubbing start signal, so that the audio to be dubbed includes, except for the dubbing audio, The volume of other audio tracks other than the audio track is adjusted to the second target volume, and then the audio signal to be recorded is acquired, and the audio signal to be recorded is recorded on the dubbing track, and in response to detecting the end dubbing signal, it is saved at the first target volume The to-be-recorded audio signal recorded on the dubbing track during the dubbing time period, and the to-be-recorded audio signal on the other audio tracks except the dubbing track included in the dubbing audio during the dubbing time period is saved at the second target volume, thereby The method of adding a sound track to the audio to be dubbed enables dubbing without modifying the original audio to be dubbed, which helps to flexibly dub and modify the dubbed audio.

Reference is now made to FIG. 6, which shows a schematic structural diagram of a terminal device 600 suitable for implementing embodiments of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals ( For example, mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, and the like. The terminal device shown in FIG. 6 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present disclosure.

As shown in FIG. 6, the terminal device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608. The program in the memory (RAM) 603 executes various appropriate actions and processing. In the RAM 603, various programs and data required for the operation of the terminal device 600 are also stored. The processing device 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.

Generally, the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speaker, vibration An output device 607 such as a storage device; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609. The communication device 609 may allow the terminal device 600 to perform wireless or wired communication with other devices to exchange data. Although FIG. 6 shows a terminal device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may be implemented alternatively or provided with more or fewer devices.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product that includes a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 609, or from the storage device 608, or from the ROM 602. When the computer program is executed by the processing device 601, the above-described functions defined in the method of the embodiments of the present disclosure are executed.

It should be noted that the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In this disclosure, the computer-readable signal medium may include a data signal that is propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device . The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: electric wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The above-mentioned computer-readable medium may be included in the above-mentioned terminal device; or it may exist alone without being assembled into the terminal device. The above-mentioned computer-readable medium carries one or more programs. When the above-mentioned one or more programs are executed by the terminal device, the terminal device: In response to detecting the start dubbing signal triggered by the user, dubbing on the audio to be dubbed The volume of the audio track is adjusted to the first target volume, and the volume of the audio tracks other than the dubbed audio track included in the audio to be dubbed is adjusted to the second target volume, wherein the dubbing audio track is added to the audio to be dubbed in advance Audio track with preset volume; Obtain the audio signal to be recorded and record the audio signal to be recorded on the dubbing track; In response to detecting the end dubbing signal, save the dubbing at the first target volume during the dubbing period The to-be-recorded audio signal on the track is saved at the second target volume, and the to-be-recorded audio signal on the other audio tracks except the dubbing audio track included in the dubbing time period is stored.

The computer program code used to perform the operations of the present disclosure can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In situations involving remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, through an Internet service provider Internet connection).

The flowcharts and block diagrams in the drawings illustrate the possible implementation architecture, functions, and operations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains one or more logic functions Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with dedicated hardware-based systems that perform specified functions or operations Or, it can be realized by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances. For example, the recording unit can also be described as a "unit of the recording unit".

The above description is only the preferred embodiment of the present disclosure and the explanation of the applied technical principles. Those skilled in the art should understand that the scope of the disclosure in this disclosure is not limited to a technical solution formed by a specific combination of the above technical features, but should also cover the above technical features or without departing from the above disclosed concept. Other technical solutions formed by arbitrary combination of equivalent features. For example, the above features and the technical features disclosed in this disclosure (but not limited to) with similar functions are replaced with each other to form a technical solution.

Claims

A method for processing audio, including:

In response to detecting a user-triggered dubbing start signal, the volume of the dubbing audio track on the audio to be dubbed is adjusted to the first target volume, and other audio tracks included in the audio dubbing except the dubbing audio track The volume of is adjusted to the second target volume, where the dubbing audio track is an audio track with a preset volume added to the audio to be dubbed in advance;

Acquiring an audio signal to be recorded, and recording the audio signal to be recorded on the dubbing soundtrack;

In response to detecting the end dubbing signal, the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period is saved at the first target volume, and the second target volume is saved during the dubbing time period. The audio to be dubbed includes audio signals to be recorded on audio tracks other than the dubbed audio track.
The method according to claim 1, wherein, during the time period of saving the dubbing at the second target volume, the audio to be dubbed includes the audio on a track other than the dubbing track After the audio signal is to be recorded, the method further includes:

The volume of the dubbing audio track is adjusted to the preset volume, and the volume of the audio tracks other than the dubbing audio track included in the audio to be dubbed is adjusted to the initial volume.
The method according to claim 1, wherein, during the time period of saving the dubbing at the second target volume, the audio to be dubbed includes the audio on a track other than the dubbing track After the audio signal is to be recorded, the method further includes:

In response to detecting the modified dubbing signal triggered by the user, displaying an interface for modifying the to-be-recorded audio signal recorded on the dubbing track;

In response to detecting the end of the modified dubbing signal triggered by the user, the modified audio signal to be recorded on the dubbing track is saved.
The method according to claim 3, wherein the modification operation includes at least one of the following: a deletion operation, a cropping operation, and a re-recording operation.
The method according to any one of claims 1-4, wherein the first target volume and the second target volume are respectively preset volume or volume adjusted by the user.
A device for processing audio, including:

The adjustment unit is configured to, in response to detecting a user-triggered dubbing start signal, adjust the volume of the dubbing track on the audio to be dubbed to the first target volume, and to exclude the dubbing audio track included in the audio to be dubbed The volume of the other audio tracks is adjusted to the second target volume, where the dubbing audio track is an audio track with a preset volume added to the audio to be dubbed in advance;

The recording unit is configured as

Acquiring an audio signal to be recorded, and recording the audio signal to be recorded on the dubbing soundtrack;

The saving unit is configured to, in response to detecting an end dubbing signal, save the to-be-recorded audio signal recorded on the dubbing track during the dubbing time period at the first target volume, and save the audio signal at the second target volume. In the dubbing time period, the audio to be dubbed includes audio signals to be recorded on other audio tracks except the dubbing audio track.
The device according to claim 6, wherein the storage unit is further configured to:

The volume of the dubbing audio track is adjusted to the preset volume, and the volume of the audio tracks other than the dubbing audio track included in the audio to be dubbed is adjusted to the initial volume.
The device according to claim 6, wherein the storage unit comprises:

The display module is configured to display an interface for modifying the audio signal to be recorded recorded on the dubbing track in response to detecting the modified dubbing signal triggered by the user;

The saving module is configured to, in response to detecting the end modification of the dubbing signal triggered by the user, save the modified audio signal to be recorded on the dubbing track.
The apparatus according to claim 8, wherein the modification operation includes at least one of the following: a deletion operation, a cropping operation, and a re-recording operation.
8. The device according to any one of claims 6-9, wherein the first target volume and the second target volume are respectively preset volume or volume adjusted by the user.
A terminal device, including:

One or more processors;

A storage device on which one or more programs are stored,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-5.
A computer-readable medium with a computer program stored thereon, wherein the program is executed by a processor to implement the method according to any one of claims 1-5.