CN117037835A

CN117037835A - Audio data processing method, electronic device and medium

Info

Publication number: CN117037835A
Application number: CN202311042957.8A
Authority: CN
Inventors: 郑国炳
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-11-10
Also published as: CN111370018B; CN111370018A

Abstract

The embodiment of the application discloses a processing method of audio data, electronic equipment and a medium. The processing method of the audio data comprises the following steps: collecting sound signals to obtain first audio data; separating second audio data including a sound of the first target sound object and third audio data including a sound of the second target sound object from the first audio data, the second audio data including the first target audio piece, the third audio data including the second target audio piece; acquiring a first data volume of a first target audio fragment under a target time stamp and a second data volume of a second target audio fragment under the target time stamp, wherein the target time stamp is a time stamp of overlapping the first target audio fragment and the second target audio fragment; and under the condition that the first data amount is equal to the second data amount, splicing the first target audio data in the first target audio fragment with the second target audio data of the second target audio fragment to obtain target audio data.

Description

Audio data processing method, electronic device and medium

The application is based on the application number 202010131305.1, the application date 2020, the 28 th month, the application name of the application is "audio data processing method, electronic equipment and medium" of the application of the Weiwa mobile communication Co., ltd.

Technical Field

The embodiment of the application relates to the technical field of Internet, in particular to a processing method of audio data, electronic equipment and a medium.

Background

With the continuous development of electronic devices, users can record surrounding sounds at any time by using the electronic devices. However, the presence of many different sounds in the surrounding environment may result in the resulting audio file containing many different sounds.

At present, in order to ensure a better pickup effect, noise elimination processing is generally performed in the pickup process. Namely, the environmental sound is completely eliminated by the algorithm, and only the target voice is reserved.

However, the requirements of the user for the audio information to be retained in the audio data are different in different application scenarios. For example, when the user is picking up sound outdoors, the user may want to retain only ambient sound and not human sound.

Therefore, the audio data processing method in the prior art cannot process according to the personalized requirements of the user, so that the data processing mode is single.

Disclosure of Invention

The embodiment of the application provides an audio data processing method, electronic equipment and a medium, which can process audio data according to the personalized requirements of users, obtain the audio data meeting the requirements of the users and improve the use experience of the users.

In a first aspect, an embodiment of the present application provides a method for processing audio data, which is applied to an electronic device, and includes:

collecting sound signals in a preset range to obtain first audio data;

separating second audio data including a sound of a first target sound-emitting object and third audio data including a sound of a second target sound-emitting object from the first audio data;

and carrying out audio processing on the second audio data and the third audio data according to a preset mode based on a preset gain to obtain target audio data.

In a second aspect, an embodiment of the present application provides an electronic device, including:

the acquisition module is used for acquiring sound signals in a preset range to obtain first audio data;

a separation module for separating second audio data including a sound of a first target sound object and third audio data including a sound of a second target sound object from the first audio data;

the processing module is used for carrying out audio processing on the second audio data and the third audio data according to a preset mode based on a preset gain to obtain target audio data.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the processor executes instructions of the computer program to implement the steps of the method for processing audio data according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for processing audio data according to the first aspect.

In the embodiment of the application, the user sets different preset gains, so that the electronic equipment can carry out audio processing on two audio data with different sound production objects according to the different preset gains set by the user, and finally, target audio data corresponding to the different gains is obtained, and further, the finally obtained target audio data can meet the personalized requirements of the user, thereby improving the use experience of the user.

Drawings

The application will be better understood from the following description of specific embodiments thereof taken in conjunction with the accompanying drawings in which like or similar reference characters designate like or similar features.

Fig. 1 is a flow chart illustrating a method for processing audio data according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a display interface of an electronic device according to an embodiment of the present application;

fig. 3 is a flowchart of an audio data processing method according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The user can record the target sound which the user wants to save through pickup. However, due to uncertainty of the sound pick-up environment, the user cannot ensure that only the target sound exists in the surrounding environment during the sound pick-up process, and therefore, the audio file obtained by the method often includes not only the target sound but also other sounds.

Therefore, in the prior art, audio processing is generally performed on the picked-up audio file, so that the ambient sound is eliminated, and only the target sound is reserved.

However, the single audio processing method in the prior art can only meet the requirement of the user on the target sound, but cannot meet the requirement that the user needs to play other sounds except the target sound in other application scenes, so that the use experience of the user is lower.

In order to solve the above-mentioned problems, the embodiments of the present application provide a processing method, an electronic device, and a medium capable of obtaining audio data meeting the personalized needs of users.

Fig. 1 is a flowchart of a processing method of audio data according to an embodiment of the present application.

As shown in fig. 1, the method for processing audio data applied to an electronic device includes:

s101, collecting sound signals in a preset range to obtain first audio data.

Alternatively, in some embodiments of the present application, the preset range may be a sound collection range of the sound pickup apparatus, and the first audio data may include all sounds within the preset range.

For example, when a user collects a sound signal in a preset range outdoors by using the collection device, the obtained first audio data may include: car sounds, animal sounds, and human sounds.

Alternatively, in some embodiments of the present application, sound signals within a preset range may be collected by using sound pickup devices located at different positions to obtain the first audio data.

For example, a built-in microphone in the electronic device and an earphone microphone connected with the electronic device are used for collecting sound signals within a preset range, so that first audio data with two paths of audio data are obtained.

In the embodiment of the application, the sound signals are collected by the pickup devices positioned at different positions, so that the characteristic that the pickup devices positioned at different positions have different perceptibility of the same target sound intensity can be fully utilized, and the first audio data which is convenient for accurately carrying out sound separation can be obtained.

S102, separating second audio data comprising the sound of the first target sound production object and third audio data comprising the sound of the second target sound production object from the first audio data.

The sound object may be an object that emits sound, for example, a dog call and a car sound are included in the environment, and the sound object includes a dog and a car.

Optionally, in an embodiment of the present application, if the first audio data includes only one audio data, the second audio data and the third audio data may be separated from the first audio data by using a trained neural network through a method of training the neural network.

For example, if the first audio data is one audio data including a voice of a person and a dog call, the second audio data including the voice of the first target sound object (the voice of the person) and the third audio data including the voice of the second target sound object (the dog call) may be separated from the first audio data through the neural network.

Optionally, in the embodiment of the present application, if the first audio data includes two paths of audio data, not only the first audio data may be separated by using a neural network, but also the second audio data and the third audio data may be separated from the first audio data by using a noise cancellation algorithm in the prior art.

In the embodiment of the application, the first audio data acquired by the sound pickup devices positioned at different positions are separated, so that the characteristic that the same sound signal acquired by the sound pickup devices positioned at different positions has different intensities can be fully utilized, and the second audio data of the sound with the first target sound production object and the third audio data of the sound with the second target sound production object are more accurately acquired.

And S103, performing audio processing on the second audio data and the third audio data according to a preset mode based on a preset gain to obtain target audio data.

Alternatively, in some embodiments of the present application, the preset manner may be a mixing or setting to a binaural mode, or the like. The preset gain may be a first gain value including a second audio data and a second gain value including a third audio data.

After the electronic device obtains the second audio data and the third audio data, the electronic device can perform audio processing on the first gain value corresponding to the second audio data and the second gain value corresponding to the third audio data according to a preset mode, and finally target audio data is obtained.

The preset gain may be a non-negative number. The smaller the gain value, the smaller the current sound intensity is, which is less than the original sound intensity. For example, if the first gain value in the preset gain is 0db (decibel), it indicates that the second audio is the original sound intensity, and the second gain value in the preset gain is-1000 db, it indicates that the third audio data is lower than 1000db of the original sound intensity.

Optionally, in some embodiments of the present application, after the target audio file is obtained, the target audio file may be further saved according to the requirement of the user, so as to be played later. For example, the target audio file may be saved as a mix file, a left and right channel stereo file, or a binaural source file, or the like.

Optionally, in order to better meet the personalized requirement of the user, in some embodiments of the present application, the user may further autonomously determine the preset increment before S103.

The method of determining the preset gain is described in detail below by some embodiments.

Optionally, in some embodiments of the present application, as shown in fig. 2, fig. 2 is a schematic diagram of a display interface of an electronic device according to an embodiment of the present application.

As shown in fig. 2, two first controls, namely a first control 10 and a first control 20, are displayed on a display interface of the electronic device. The user can adjust the preset gain by making a second input to the first control 10 and the first control 20. The second input may be a click input or a slide input.

For example, the user adjusts the first gain value of the second audio data object in the preset gain by adjusting the first control 10, and the user adjusts the second gain value of the third audio data object in the preset gain by adjusting the first control 20.

After receiving the second input of the user to the first control 10 and/or the first control 20, the electronic device may respond to the second input, and perform audio processing on the second audio data and the third audio data according to a preset manner based on a preset gain associated with the second input, so as to obtain target audio data.

As an example, if the user adjusts the first gain value of the second audio data object in the preset gain to-500 db and the second gain value of the third audio data object in the preset gain to 0db through the second inputs to the first control 10 and the first control 20, the electronic device will respond to the second inputs, and perform audio processing on the second audio data of-500 db lower than the original sound intensity and the third audio data of the original sound intensity according to the preset manner, so as to obtain the target audio data.

In the embodiment of the application, the first control matched with the preset gain is configured on the electronic equipment, so that a user can adjust the preset gain at any time according to own personalized requirements, the electronic equipment can perform audio processing according to the preset gain which is selected by the user independently, and finally, target audio data meeting the user requirements is obtained, thereby improving the use experience of the user.

In order to make the operation of the user more convenient and quick, a plurality of pickup scene information is additionally arranged, so that the user can automatically match the preset gain associated with the target scene information by selecting the target scene information. The method and the device enable some users without professional tuning skills to conduct audio processing on the electronic equipment according to the preferable preset gain by selecting the target scene so as to obtain audio data with excellent sound quality.

The audio data processing method of adding scene information is described in detail below with reference to fig. 3.

Fig. 3 is a flow chart of an audio data processing method according to another embodiment of the application. The method comprises the following steps:

s301, acquiring sound signals in a preset range to obtain first audio data.

S302, second audio data including the sound of the first target sound object and third audio data including the sound of the second target sound object are separated from the first audio data.

Wherein, S301-S302 and S101-S102 are the same steps, and are not described herein.

S303, receiving a first input of target scene information selected by a user.

The target scene information may be a video scene or a dialogue scene, etc. The video scene can be further divided into a rear camera video scene and a front camera video scene.

S304, performing audio processing on the second audio data and the third audio data according to a preset mode based on a preset gain associated with the target scene information to obtain target audio data.

After receiving the first input of the user aiming at the target scene information, the electronic device can further perform audio processing on the second audio data and the third audio data according to a preset mode based on preset increment associated with the target scene information to obtain target audio data.

Next, the details of S304 will be described with respect to the front camera video scene, the rear camera video scene, and the dialogue scene, respectively.

Optionally, in some embodiments of the present application, if the electronic device receives a first input from a user for a front camera video scene (target scene information). Therefore, when the target scene information selected by the user is the front-end camera video scene, the environment sound is generally eliminated, and the voice is reserved. That is, the gain value corresponding to the environmental sound in the preset gain is adjusted to be the minimum value, and the gain value corresponding to the human sound in the preset gain is adjusted to be 0db.

Optionally, in some embodiments of the present application, if the electronic device receives a first input from a user for a rear camera video scene (target scene information). Therefore, when the target scene information selected by the user is the rear camera video scene, the voice is generally eliminated and the environmental sound is reserved. That is, the gain value corresponding to the human voice in the preset gain is adjusted to be the minimum value, and the gain value corresponding to the environmental voice in the preset gain is adjusted to be 0db.

Alternatively, in some embodiments of the present application, if the electronic device receives a first input from the user for a dialog scene (target scene information). It is highly likely that two persons (first target sound emission object and second target sound emission object) speak at the same time in view of the dialogue scene.

Therefore, in order not to overlap sounds when the second audio data and the third audio data are subjected to audio processing, it may be that the second audio data and the third audio data are set to a binaural mode in accordance with a preset gain associated with dialog scene information. The preset gain associated with the dialogue scene information may be a gain value corresponding to the second audio data and a gain value corresponding to the third audio data that are equal.

The second audio data and the third audio data are set to be in a binaural mode, so that the finally obtained target audio data are in the binaural mode, and therefore, when the target audio file is played later, the target audio data can be played through the left channel and the right channel, the occurrence of overlapping of the second audio data and the third audio data in the playing process is avoided, the target audio file with higher tone quality is obtained, and the use experience of a user is improved.

In addition, in order not to overlap sound when the second audio data and the third audio data are subjected to audio processing, in some embodiments of the present application, the second audio data and the third audio data may be time-lapse mixed according to a preset gain associated with dialogue scene information. The preset gain associated with the dialogue scene information may be a gain value corresponding to the second audio data and a gain value corresponding to the third audio data that are equal.

The process of time-lapse mixing the second audio data and the third audio data will be described in detail with reference to some embodiments.

Optionally, in some embodiments of the application, the second audio data comprises a first target audio piece and the third audio data comprises a second target audio piece, wherein the first target audio piece and the second target audio piece have overlapping time stamps. For example, the second audio data includes time stamps of 1s to 10s, and the third audio data includes time stamps of 1s to 10s. The first target audio segment may include a time stamp of 1s-3s and the second target audio segment may include a time stamp of 2s-4s. Thus, the first target audio piece and the second target audio piece have overlapping time stamps of 2s-3s.

Second, the electronic device needs to obtain a first amount of data of the first target audio piece under the target timestamp and a second amount of data of the second target audio piece.

For example, the amount of data at each timestamp in 1s-3s for the first target audio segment is, in order: 1 byte (B), 1B, 0B; the data amount of the second target audio segment under each timestamp in 2s-4s is as follows in sequence: 0B, 1B.

Next, taking the target timestamp as an example, if the first data amount of the first target audio segment under the target timestamp is 1B, and the second data amount of the second target audio segment under the target timestamp is also 1B. I.e. the first data amount equals the second data amount.

It is continued to determine whether the first target audio file and the second target audio file have their data amounts increased after the target time stamp.

Continuing to sequentially count the data amount of the first target audio segment under each timestamp in 1s-3s as follows: 1 byte (B), 1B, 0B; the data amount of the second target audio segment under each timestamp in 2s-4s is as follows in sequence: 0B, 1B, the target timestamp is 2s for example.

Then after 2s the data amount of the first audio target audio piece is still 2B, which is the same as the data amount within 1s-2s, then it is determined that the data amount of the first audio target audio piece after the target timestamp has not increased. The second target audio piece may be determined to be second target audio data.

After 2s, the data amount of the second audio target audio piece is 2B, and if the data amount 1B in 1s-2s is increased, it is determined that the data amount of the first audio target audio piece is increased after the target time stamp. The first target audio piece may be determined to be the first target audio data.

Then, the first target audio data can be spliced to the second target audio data to obtain the target audio data.

In the embodiment of the application, whether the data volume of the audio data after the target time stamp is increased is judged, so that the fact that which sound object continuously speaks after the target time stamp is obtained can be accurately obtained, the audio data corresponding to the sound object which continuously speaks is spliced after the audio data corresponding to the sound object which speaks in the former, the occurrence of overlapping of the second audio data and the third audio data in the playing process can be effectively avoided, further, the target audio file with higher sound quality is obtained, and the use experience of a user is improved.

Based on the specific implementation manner of the audio data processing method provided by the embodiment, correspondingly, the application further provides a specific implementation manner of the audio data processing device. Please refer to fig. 4.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic device includes:

the acquisition module 410 is configured to acquire a sound signal within a preset range to obtain first audio data;

a separation module 420 for separating second audio data including a sound of the first target sound-emitting object and third audio data including a sound of the second target sound-emitting object from the first audio data;

the processing module 430 is configured to perform audio processing on the second audio data and the third audio data according to a preset manner based on a preset gain, so as to obtain target audio data.

Optionally, in some embodiments of the present application, the acquisition module 410 is specifically configured to:

and collecting sound signals in a preset range by using sound pickup equipment positioned at different positions to obtain first audio data.

Optionally, in some embodiments of the present application, the electronic device further includes:

the receiving module is used for receiving a first input of a user aiming at target scene information;

the processing module 430 is specifically further configured to perform audio processing on the second audio data and the third audio data according to a preset manner based on a preset gain associated with the target scene information, so as to obtain target audio data.

Optionally, in some embodiments of the present application, the receiving module is further configured to:

receiving a second input of a user to the first control;

the processing module 430 is specifically further configured to perform audio processing on the second audio data and the third audio data according to a preset manner based on a preset gain associated with the second input in response to the second input, so as to obtain target audio data.

Optionally, in some embodiments of the present application, in a case where the target scene information is a dialog scene, the processing module 430 is specifically further configured to:

the second audio data and the third audio data are set to a binaural mode based on a preset gain associated with the dialog scene information.

Optionally, in some embodiments of the application, the second audio data comprises a first target audio piece and the third audio data comprises a second target audio piece; wherein the first target audio piece and the second target audio piece have overlapping time stamps;

in the case where the target scene information is a dialog scene, the processing module 430 further includes:

the acquisition sub-module is used for acquiring a first data volume of the first target audio fragment under the target time stamp and a second data volume of the second target audio fragment under the target time stamp;

a determining sub-module for determining first target audio data and second target audio data in the first target audio clip and the second target audio clip if the first data amount is equal to the second data amount;

the splicing module is used for splicing the first target audio data after the second audio target data;

wherein the first target audio data is audio data whose data amount increases after the target time stamp, and the second target audio data is audio data whose data amount does not increase after the target time stamp.

Optionally, in some embodiments of the present application, the preset manner includes at least one of the following: mixing processing and binaural mode processing.

The electronic device 600 includes, but is not limited to: radio frequency unit 601, network module 602, audio output unit 603, input unit 604, sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, processor 610, and power supply 611. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 5 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the application, the electronic equipment comprises, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.

The input unit 604 is configured to collect a sound signal within a preset range, so as to obtain first audio data; the processor 610 is configured to separate second audio data including a sound of the first target sound-emitting object and third audio data including a sound of the second target sound-emitting object from the first audio data; and carrying out audio processing on the second audio data and the third audio data according to a preset mode based on a preset gain to obtain target audio data.

It should be understood that, in the embodiment of the present application, the radio frequency unit 601 may be used to receive and send information or signals during a call, specifically, receive downlink data from a base station, and then process the downlink data with the processor 610; and, the uplink data is transmitted to the base station. Typically, the radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 601 may also communicate with networks and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 602, such as helping the user to send and receive e-mail, browse web pages, and access streaming media, etc.

The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the electronic device 600. The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used for receiving audio or video signals. The input unit 604 may include a graphics processor (Graphics Processing Unit, GPU) 6041 and a microphone 6042, the graphics processor 6041 processing image data of still pictures or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 606. The image frames processed by the graphics processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. Microphone 6042 may receive sound and can process such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 601 in the case of a telephone call mode.

The electronic device 600 also includes at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and the proximity sensor can turn off the display panel 6061 and/or the backlight when the electronic device 600 moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 605 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.

The display unit 606 is used to display information input by a user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 607 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 6071 or thereabout using any suitable object or accessory such as a finger, stylus, or the like). The touch panel 6071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 610, and receives and executes commands sent from the processor 610. In addition, the touch panel 6071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 6071 may be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 610 to determine a type of a touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although in fig. 5, the touch panel 6071 and the display panel 6061 are two independent components for implementing the input and output functions of the electronic device, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.

The interface unit 608 is an interface to which an external device is connected to the electronic apparatus 600. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 600 or may be used to transmit data between the electronic apparatus 600 and an external device.

The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 610 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 609, and calling data stored in the memory 609, thereby performing overall monitoring of the electronic device. The processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The electronic device 600 may also include a power supply 611 (e.g., a battery) for powering the various components, and preferably the power supply 611 may be logically coupled to the processor 610 via a power management system that performs functions such as managing charging, discharging, and power consumption.

In addition, the electronic device 600 includes some functional modules, which are not shown, and will not be described herein.

Preferably, the embodiment of the present application further provides an electronic device, including a processor 610, a memory 609, and a computer program stored in the memory 609 and capable of running on the processor 610, where the computer program when executed by the processor 610 implements each process of the above embodiment of the method for processing audio data, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above-mentioned audio data processing method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method for processing audio data, applied to an electronic device, comprising:

collecting sound signals to obtain first audio data;

separating second audio data including a sound of a first target sound-emitting object and third audio data including a sound of a second target sound-emitting object from the first audio data; wherein the second audio data comprises a first target audio segment and the third audio data comprises a second target audio segment, the first and second target audio segments having overlapping time stamps;

acquiring a first data volume of the first target audio fragment under a target time stamp and a second data volume of the second target audio fragment under the target time stamp, wherein the target time stamp is a time stamp when the first target audio fragment and the second target audio fragment overlap;

under the condition that the first data amount is equal to the second data amount, splicing the first target audio data in the first target audio segment with the second target audio data of the second target audio segment to obtain target audio data; wherein the first target audio data is audio data whose data amount increases after the target time stamp, and the second target audio data is audio data whose data amount does not increase after the target time stamp.

2. The method of claim 1, wherein the capturing the sound signal to obtain the first audio data comprises:

and collecting the sound signals by using pickup devices positioned at different positions to obtain the first audio data.

3. The method of claim 1, wherein the obtaining a first amount of data of the first target audio segment at a target timestamp and a second amount of data of the second target audio segment at the target timestamp is preceded by the obtaining, the method further comprising:

receiving a first input of selecting target scene information by a user;

the obtaining the first data amount of the first target audio segment under the target timestamp and the second data amount of the second target audio segment under the target timestamp includes:

and under the condition that the target scene information is a dialogue scene, acquiring a first data volume of the first target audio fragment under a target time stamp and a second data volume of the second target audio fragment under the target time stamp.

4. The method of claim 1, wherein the obtaining a first amount of data of the first target audio segment at a target timestamp and a second amount of data of the second target audio segment at the target timestamp is preceded by the obtaining further comprises:

receiving a second input of a user to the first control;

the splicing the first target audio data in the first target audio segment with the second target audio data in the second target audio segment to obtain target audio data includes:

and responding to the second input, and based on a preset gain associated with the second input, splicing the first target audio data in the first target audio segment with the second target audio data of the second target audio segment to obtain target audio data.

5. An electronic device, comprising:

the acquisition module is used for acquiring sound signals to obtain first audio data;

a separation module for separating second audio data including a sound of a first target sound object and third audio data including a sound of a second target sound object from the first audio data; wherein the second audio data comprises a first target audio segment and the third audio data comprises a second target audio segment, the first and second target audio segments having overlapping time stamps;

the acquisition sub-module is used for acquiring a first data volume of the first target audio fragment under a target time stamp and a second data volume of the second target audio fragment under the target time stamp, wherein the target time stamp is a time stamp when the first target audio fragment and the second target audio fragment overlap;

the splicing module is used for splicing the first target audio data in the first target audio segment with the second target audio data of the second target audio segment under the condition that the first data volume is equal to the second data volume to obtain target audio data; wherein the first target audio data is audio data whose data amount increases after the target time stamp, and the second target audio data is audio data whose data amount does not increase after the target time stamp.

6. The electronic device of claim 5, wherein the acquisition module is specifically configured to:

7. The electronic device of claim 5, wherein the electronic device further comprises:

the receiving module is used for receiving a first input of target scene information selected by a user;

the obtaining sub-module is specifically configured to obtain, when the target scene information is a dialogue scene, a first data amount of the first target audio segment under a target timestamp and a second data amount of the second target audio segment under the target timestamp.

8. The electronic device of claim 5, wherein the electronic device comprises a memory device,

the receiving module is used for receiving a second input of the first control by the user;

the splicing module is further configured to, in response to the second input, splice the first target audio data in the first target audio segment to the second target audio data in the second target audio segment based on a preset gain associated with the second input, and obtain target audio data.

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of processing audio data as claimed in any one of claims 1 to 4 when the computer program instructions are executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the method of processing audio data according to any one of claims 1 to 4.