WO2021027132A1

WO2021027132A1 - Audio processing method and apparatus and computer storage medium

Info

Publication number: WO2021027132A1
Application number: PCT/CN2019/117172
Authority: WO
Inventors: 王涛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-08-12
Filing date: 2019-11-11
Publication date: 2021-02-18
Also published as: CN110600022A; CN110600022B

Abstract

An audio processing method and apparatus and a computer storage medium, the method comprising: an electronic device acquires N audio signals, M noise signals, and P signal-to-noise ratios inputted by a user, N, M, and P all being positive integers (S201); the electronic device acquires the power of each audio signal amongst the N audio signals and the power of each noise signal amongst the M noise signals (S202); for a first audio signal amongst the N audio signals and a first signal-to-noise ratio amongst the P signal-to-noise ratios, on the basis of the power of the first audio signal and the first signal-to-noise ratio, the electronic device calculates the power of a noise signal to be added to the first audio signal (S203); on the basis of the power of the noise signal to be added to the first audio signal, the electronic device adjusts the power of the M noise signals (S204); and the electronic device performs signal mixing of the first audio signal and the M noise signals after power adjustment to obtain a signal with noise added corresponding to the first audio signal (S205). The present method can increase the processing efficiency of adding noise to audio.

Description

Audio processing method, device and computer storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 12, 2019, the application number is 201910748281.1, and the application name is "an audio processing method, device and computer storage medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of speech processing technology, and in particular to an audio processing method, device and computer storage medium.

Background technique

With the development of the Internet, audio noise addition is in demand in many industries. For example, the current popular music recognition song, ideally, if the user records a piece of music without any interference, as long as the music is stored in the music library, the music recognition system can correctly find the matching music. However, in practical applications, the music clips recorded by users will have obvious interference, including both system noise introduced by playback equipment and recording equipment, and noise surrounding the recording. Therefore, the music recognition system needs to be trained in advance to make The music recognition system can be applied to the real environment. Among them, in the training process, the audio after adding noise (that is, adding noise) needs to be used. In the prior art, the noise adding tool can add noise to the audio, but only one type of noise can be added at a time. When the user needs to add multiple different types of noise to the audio, the user needs to use the tool multiple times to add noise to a certain audio. Adding many different types of noise makes the operation cumbersome, time-consuming, and low efficiency.

Summary of the invention

The embodiments of the present application provide an audio processing method, device, and computer storage medium, which can improve the processing efficiency of adding noise to audio.

The embodiment of the application provides an audio processing method, which includes:

The electronic device acquires N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

Acquiring, by the electronic device, the power of each of the N audio signals and the power of each of the M types of noise signals;

For the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, the electronic device is based on the power of the first audio signal and the first signal-to-noise ratio Calculating the power of the noise signal to be added to the first audio signal;

The electronic device adjusts the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal;

The electronic device mixes the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.

An embodiment of the present application also provides an audio processing device, including:

The first acquiring unit is configured to acquire N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

A second acquiring unit, configured to acquire the power of each audio signal in the N audio signals and the power of each noise signal in the M types of noise signals;

The calculation unit is configured to, for the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, according to the power of the first audio signal and the first signal-to-noise ratio Than calculating the power of the noise signal required to be added to the first audio signal;

An adjusting unit, configured to adjust the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal;

The mixing unit is configured to mix the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.

The embodiment of the present application also provides an electronic device, which includes a processor, an input device, an output device, and a memory, and the processor, the input device, the output device, and the memory are connected to each other. Wherein, the communication interface is used to communicate with other electronic devices (such as electronic devices), the memory is used to store the implementation code of the foregoing audio processing method, and the processor is used to execute the program code stored in the memory, that is, the foregoing audio processing method is executed.

The embodiments of the present application also provide a computer non-volatile readable storage medium, which stores instructions on the non-volatile readable storage medium, and when the non-volatile readable storage medium runs on a processor, the processor executes the above audio processing method.

The embodiment of the present application also provides a computer program product containing instructions, which when running on a processor, causes the processor to execute the above audio processing method.

To implement the embodiments of the present application, the electronic device can add noise to one or more audio signals at one time, and can add one or more different types of noise to an audio signal, and can obtain multiple different types of noise for one audio signal at a time. The output signal of signal-to-noise ratio does not require the user to perform multiple operations to add noise to multiple audio signals, does not require the user to add multiple noise types to an audio signal through multiple operations, and does not require the user to target the same audio through multiple operations The signal gets multiple output signals with different signal-to-noise ratios, which saves the user's operation, reduces the operation time, improves the efficiency of adding noise to the audio, and realizes batch audio processing.

The additional aspects and advantages of this application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of this application.

Description of the drawings

The above and/or additional aspects and advantages of the present application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application;

2 is a schematic flowchart of an audio processing method provided by an embodiment of this application;

FIG. 3 is a schematic diagram of a user input interface provided by an embodiment of the application;

4 is a schematic diagram of parameters of an audio signal provided by an embodiment of the application;

5A is a schematic diagram of another user input interface provided by an embodiment of the application;

5B is a schematic diagram of another user input interface provided by an embodiment of the application;

FIG. 6 is a schematic structural diagram of an audio processing device provided by an embodiment of the application.

detailed description

In order to make the purpose, technical solutions and advantages of the application more clear, the application will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the application, rather than all the embodiments. . Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent in these processes, methods or equipment.

Reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

The electronic devices involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc. For example, it can be a mobile terminal such as a smart phone, a tablet computer, or other terminals, and there is no limitation here. For ease of description, the devices mentioned above are collectively referred to as electronic devices. The embodiments of the present application are described below in conjunction with the drawings.

Please refer to FIG. 1, which is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 1, the electronic device 100 includes: at least one processor 101, at least one input device 102, and at least one output device 103, a memory 104, and at least one bus 105. Among them, the bus 105 is used to implement connection and communication between these components.

In the embodiment of the present application, the processor 101 may be a central processing unit (Central Processing Unit, CPU) or a graphics processing unit (Graphics Processing Unit, GPU). In some embodiments, it may also be referred to as an application processor (application processor). , AP) to distinguish it from the baseband processor. The processor 101 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), ready-made programmable gate arrays (Field-Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The input device 102 may include a touch panel, a fingerprint sensor (used to collect user fingerprint information and fingerprint orientation information), a camera, a microphone, etc., and the output device 103 may include a display (LCD, etc.), a speaker, etc.

The memory 104 may include a read-only memory and a random access memory, and provides instructions and data to the processor 101. The processor 101 can be used to read and execute computer readable instructions. Specifically, the processor 101 may be used to call data stored in the memory 104. A part of the memory 104 may also include a non-volatile random access memory.

In specific implementation, the processor 101, the input device 102, and the output device 103 described in the embodiments of the present application can execute part or all of the processes involved in the audio processing method shown in FIG. 2 below.

Optionally, the electronic device 100 may further include a communication interface. The communication interface may be a transceiver, a transceiver circuit, etc., where the communication interface is a general term and may include one or more interfaces, such as an interface between an electronic device and a server. The communication interface may include a wired interface and a wireless interface, such as a standard interface, Ethernet, and a multi-machine synchronization interface. Optionally, when the processor 101 receives any message or data, it specifically receives it by driving or controlling the communication interface. Therefore, the processor 101 can be regarded as a control center that performs sending or receiving, and the communication interface is a specific performer of sending and receiving operations.

In the embodiment of the present application, the electronic device 100 may be a terminal, a server, a computer, a video playback device, etc., capable of computing or processing.

Based on the structure of the electronic device shown in FIG. 1, FIG. 2 provides an audio processing method related to an embodiment of the present application. The audio processing method includes but is not limited to the following steps S201-S202.

S201: The electronic device obtains N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers.

The audio signal input by the user may be one audio signal or multiple audio signals. The noise signal input by the user may be one type of noise signal or multiple different types of noise signals. The signal-to-noise ratio input by the user can be one signal-to-noise ratio or multiple signal-to-noise ratios.

Optionally, the audio signal input by the user may be music, voice, and so on.

Optionally, the type of noise signal input by the user includes noise that can be generated by the signal generating device such as white noise, Gaussian noise, pink noise, or colored noise, and may also include other types of noise recorded by the user, such as the sound of water flowing, the sound of birds, etc. Real environmental noise.

The signal-to-noise ratio input by the user refers to the ratio of the signal power and the noise power of the desired audio after adding noise to the audio signal.

The user input interface will be explained below in conjunction with FIG. 3. The user input interface may be, for example, but not limited to, as shown in FIG. As shown in FIG. 3, the user input interface includes: an audio signal input box 301, a noise signal input box 302, a signal-to-noise ratio input box 303, and a confirm button 304. If you need to input multiple audio signals, you can trigger the input of multiple audio signals by clicking the "+" sign on the right of the audio signal input box 301. Similarly, if you need to input multiple noise signals, you can click the noise signal input box 302 The "+" sign on the right triggers the input of multiple noise signals. If you need to input multiple signal-to-noise ratios, you can click the "+" sign on the right of the signal-to-noise ratio input box 303 to trigger input multiple signal-to-noise ratios.

Optionally, after receiving the user's instruction to click the audio signal input box 301, the electronic device may receive the audio signal input by the user, such as voice or music, through a voice input device of the electronic device, such as a microphone. Alternatively, after the electronic device receives the user's instruction to click the audio signal input box 301, it can display the files stored locally in the electronic device, and the user can select the audio signal from the files locally stored in the electronic device.

Similarly, after the electronic device receives the user's instruction to click on the noise signal input box 302, it can receive the noise signal input by the user through the voice input device of the electronic device, such as a microphone, such as the sound of water flow or the sound of birds. Alternatively, after receiving the user's instruction to click the noise signal input box 302, the electronic device may display the noise type, and the user may select the noise signal from the noise type.

The user can click the OK button 304 after inputting the audio signal, the noise signal, and the signal-to-noise ratio. After receiving the user's operation of clicking the OK button 304, the electronic device executes step S202. For example, the user inputs 2 audio signals, audio signal 1 and audio signal 2, the user inputs 2 noise signals, noise signal 1 and noise signal 2, and the user inputs 2 signal-to-noise ratios, which are signal Noise ratio 1 and signal to noise ratio 2.

S202: The electronic device obtains the power of each of the N audio signals and the power of each of the M types of noise signals.

Optionally, the electronic device obtains the power of each audio signal, including:

The electronic device extracts the amplitude of each audio signal, and obtains the power of each audio signal according to the amplitude of each audio signal. If the user inputs the audio signal through a microphone, the electronic device can calculate the power of the audio signal according to the amplitude of the audio signal input by the user. If the user selects an audio file from the files stored locally in the electronic device, the electronic device can use a voice analysis tool to convert the audio file into the audio signal shown in Figure 4, where the horizontal axis is the time, the vertical axis is the amplitude, and the electronic The device can calculate the power of the audio signal according to the amplitude of the audio signal.

Electronic equipment obtains the power of each noise signal, including:

The electronic device extracts the amplitude of each noise signal, and obtains the power of each noise signal according to the amplitude of each noise signal.

The electronic device extracts the amplitude of each noise signal, and obtains the power of each noise signal according to the amplitude of each noise signal. If the user inputs a noise signal through a microphone, the electronic device can calculate the power of the noise signal according to the amplitude of the noise signal input by the user. If the user selects the noise file locally from the electronic device, the electronic device can use a voice analysis tool to convert the noise file into the noise signal shown in Figure 4, where the horizontal axis is time and the vertical axis is amplitude. The amplitude of the noise signal is calculated to obtain the power of the noise signal.

For example, the power of audio signal 1 is 10000W, the power of noise signal 1 is 9W, and the power of noise signal 2 is 5W.

S203: For the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, the electronic device calculates the addition of the first audio signal according to the power of the first audio signal and the first signal-to-noise ratio The power of the noise signal.

Optionally, the electronic device calculating the power of the noise signal to be added to the first audio signal according to the power of the first audio signal and the first signal-to-noise ratio includes:

The electronic device calculates the power of the noise signal to be added to the first audio signal according to the Shannon formula, where the Shannon formula is signal-to-noise ratio (dB) = 10*log ₁₀ (A/B) (dB), A is the first audio signal The power of B is the power of the noise signal to be added to the first audio signal.

For example, the first audio signal is audio signal 1, the first signal-to-noise ratio is signal-to-noise ratio 1, the power value of audio signal 1 is 10000W, and the value of signal-to-noise ratio 1 is 30db. According to the Shannon formula calculation formula, 1dB=10*log ₁₀ (A/B)(dB), so 30dB=10*log ₁₀ (10000/B)(dB), calculated B=10, so the calculated audio signal 1 The power of the noise signal to be added is 10W.

Step S203 can be used to calculate the power of the noise signal that needs to be added for each audio signal.

For example, the power of the audio signal 1 and the signal-to-noise ratio 1 can be used to calculate the power of the noise signal that the audio signal 1 needs to add in one case, and the power of the audio signal 1 and the signal-to-noise ratio 2 can be calculated to obtain the audio signal 1 In another case, the power of the noise signal that needs to be added can be calculated by using the power of the audio signal 2 and the signal-to-noise ratio 1. In one case, the power of the noise signal that needs to be added can be obtained by using the audio signal 2. The power of and the signal-to-noise ratio 2 can be calculated to obtain the power of the noise signal that the audio signal 2 needs to add in another case.

S204: The electronic device adjusts the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal.

In a possible situation, the noise type input by the user includes only one noise type, and the electronic device can determine the adjustment of the noise signal input by the user after obtaining the power of the noise signal to be added by the first audio signal through step S203 After the power value. For example, if the noise selected by the user is white noise, based on the foregoing example, it can be determined that the power of the noise signal corresponding to the white noise is 10W.

In another possible situation, the noise type includes multiple noise types. In this case, the user also needs to input the weights of the multiple noise types in the user input interface. For example, see Figure 5A, which is a schematic diagram of a user input interface. The user can click the weight input box 305 in the user input interface to input the weight of each noise signal. If you need to input the weights of multiple noise signals, you can click the "+" sign on the right of the weight input box 305 to trigger the input of multiple noise signals the weight of. For example, referring to FIG. 5B, the noise type input by the user includes white noise and pink noise, and the weight corresponding to the white noise and pink noise is 3:2. Then, after the electronic device determines the power of the noise signal to be added to the first audio signal according to the power of the first audio signal and the first signal-to-noise ratio, the method further includes: the electronic device obtains the weights of the multiple noise types, and according to The weights of the multiple noise types determine the noise signal power corresponding to each noise signal in the multiple noise signals.

Taking Figure 5A as an example, the types of noise include white noise and pink noise. The weight corresponding to white noise and pink noise is 3:2. Since the total signal power of noise is 10W, the signal power of white noise is obtained according to the weight corresponding to the noise. The signal power of pink noise is 4W.

After the electronic device determines the power of the noise signal corresponding to each noise type, it adjusts the power of each noise signal. For example, if the noise input by the user is: white noise with a signal power of 9W and pink noise with a signal power of 5W, the electronic device adjusts the power of the white noise to 6W and the signal power of the pink noise to 4W.

S205: The electronic device mixes the first audio signal and the power-adjusted M types of noise signals to obtain a noise-added signal corresponding to the first audio signal.

After adjusting the power of each noise signal, the electronic device mixes the noise signal with the audio signal to obtain a noise-added signal.

For example, the power of the audio signal 1 and the signal-to-noise ratio 1 can be used to calculate the power of the noise signal that the audio signal 1 needs to add in a situation, and then adjust the power of each noise signal according to the power of the noise signal, and finally adjust The noise signal with reduced power is mixed with audio signal 1 to obtain a noise-added output signal. The signal-to-noise ratio of the output signal is signal-to-noise ratio 1. The power of the audio signal 1 and the signal-to-noise ratio 2 can be used to calculate the power of the noise signal that the audio signal 1 needs to add in another situation, and then adjust the power of each noise signal according to the power of the noise signal, and finally adjust The noise signal of power is mixed with audio signal 1 to obtain another output signal with added noise, and the signal-to-noise ratio of the output signal is signal-to-noise ratio 2. The power of the audio signal 2 and the signal-to-noise ratio 1 can be used to calculate the power of the noise signal that the audio signal 2 needs to add in a situation, and then adjust the power of each noise signal according to the power of the noise signal, and finally adjust the power The noise signal is mixed with the audio signal 2 to obtain a noise-added output signal. The signal-to-noise ratio of the output signal is signal-to-noise ratio 1. The power of the audio signal 2 and the signal-to-noise ratio 2 can be used to calculate the power of the noise signal that the audio signal 2 needs to add in another situation, and then adjust the power of each noise signal according to the power of the noise signal, and finally adjust The power noise signal is mixed with the audio signal 2 to obtain a noise-added output signal, and the signal-to-noise ratio of the output signal is signal-to-noise ratio 2. The user inputs 2 audio signals and 2 signal-to-noise ratios, and finally 4 signals with noise added can be output.

Optionally, after the electronic device mixes the first audio signal and the power-adjusted M types of noise signals to obtain the noise-added signal corresponding to the first audio signal, the method further includes:

The electronic device performs characteristic marks on the noise-added audio signal, and the characteristic marks include the signal-to-noise ratio of the noise-added audio signal, the type of noise added to the noise-added audio signal, and the noise-added audio signal. The noise power added by the audio signal.

Specifically, the electronic device performs different noise types and different proportions of noise mixtures for multiple audio signals, and after obtaining multiple noise-added audio signals, it performs a feature mark, which can indicate the noise type of the noise mixture and the noise after each noise addition. The signal-to-noise ratio is easy to distinguish the noise after adding noise. The noise-added audio storage table can be, for example, but not limited to, as shown in Table 1:

Table 1

Take audio A as an example to illustrate, the audio after audio A plus noise is: audio A1 with a signal-to-noise ratio of 10db after white noise, red-divided noise plus noise, and after white noise, red-divided noise plus noise 20db audio A2.

It should be noted that the foregoing embodiments are all described by taking all the noise input by the user as an example when noise is added. In practical applications, when noise is added to different signals, the noise used by the electronic device It can be different, and it is not necessary to use all user input noise. For example, the audio signal input by the user includes audio signal 1 and audio signal 2, the noise signal input by the user includes noise signal 1 and noise signal 2, and the signal to noise ratio input by the user includes signal to noise ratio 1 and signal to noise ratio 2. When the electronic device adds noise to audio signal 1, it can select only one of noise signal 1 and noise signal 2 to add noise to audio signal 1. Similarly, when the electronic device adds noise to audio signal 1, It is also possible to select only one of noise signal 1 and noise signal 2 to add noise to audio signal 2. The noise-added audio storage table can be, for example, but not limited to, as shown in Table 2:

Table 2

Take audio A as an example to illustrate. The audio A after adding noise is: audio A1 with a signal-to-noise ratio of 10db after mixing with white noise, audio A2 with a signal-to-noise ratio of 20db after mixing with white noise, and mixing with pink Audio A3 with a signal-to-noise ratio of 10db after noise and audio A4 with a signal-to-noise ratio of 10db mixed with pink noise.

After the electronic device mixes the first audio signal and the M types of noise signals after power adjustment to obtain the noise-added signal corresponding to the first audio signal, the method further includes:

The electronic device uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system, so that the music recognition system can recognize noisy sounds in the real environment.

To implement the embodiments of this application, the electronic device can add noise to one or more audio signals at the same time and can mix multiple noise types of noise at one time, and obtain the noise-added signal-to-noise ratio according to actual needs, so that batch processing can be performed. Simplify the noise adding operation, save time, and adjust the signal-to-noise ratio to meet the diverse needs of users.

Referring to Figure 6, Figure 6 shows a schematic structural diagram of an audio processing device. As shown in Figure 6, the audio processing device 600 includes: a first acquisition unit 601, a second acquisition unit 602, a calculation unit 603, and an adjustment unit 604 and mixing unit 605.

Wherein, the first obtaining unit 601 is configured to obtain N audio signals, M types of noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

The second acquiring unit 602 is configured to acquire the power of each audio signal in the N audio signals and the power of each noise signal in the M types of noise signals;

The calculation unit 603 is configured to, for the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, according to the power of the first audio signal and the first signal-to-noise ratio The noise ratio calculates the power of the noise signal to be added to the first audio signal;

The adjusting unit 604 is configured to adjust the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal;

The mixing unit 605 is configured to mix the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.

In an implementation manner, the calculation unit 603 is specifically configured to:

Calculate the power of the noise signal to be added to the first audio signal according to Shannon’s formula, where the Shannon’s formula is signal-to-noise ratio (dB) = 10*log ₁₀ (A/B) (dB), where A The power of the first audio signal, where B is the power of the noise signal to be added to the first audio signal.

In an implementation manner, the M is an integer greater than or equal to 2, and the audio processing device further includes:

The third acquiring unit is configured to acquire the weights of the M types of noise signals input by the user;

The adjustment unit 604 includes:

An allocation unit, configured to allocate the power of the noise signal to be added by the first audio signal to each of the M noise signals according to the weight of the M noise signals;

The processing unit is configured to adjust the power of each noise signal according to the allocated power of each noise signal in the M noise signals.

In an implementation manner, the audio processing device 600 further includes:

The training unit is used to train the music recognition system by using the noise-added signal corresponding to each of the N audio signals.

In an implementation manner, before the training unit uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system, the method further includes:

The marking unit is used to perform characteristic marking on the noise-added signal corresponding to each audio signal in the N audio signals to obtain the marked noise-added signal corresponding to each audio signal in the N audio signals, and the characteristic marking Including the signal-to-noise ratio of the noise-added signal, the type of noise added by the noise-added signal, and the noise power added by the noise-added signal;

The training unit is specifically configured to train the music recognition system by using the marked noise signal corresponding to each of the N audio signals.

In an implementation manner, the second acquiring unit 602 includes:

The first extraction unit is used to extract the amplitude of each audio signal, and the power of each audio signal is obtained according to the amplitude of each audio signal; the second extraction unit is used to extract each The amplitude of the noise signal obtains the power of each noise signal according to the amplitude of each noise signal.

In an implementation manner, the audio signal includes an audio signal input to the electronic device by the user through a voice input device. For example, the implementing voice input device may be a microphone.

In an implementation manner, the noise signal includes a noise signal input to the electronic device by the user through a voice input device. For example, the noise signal may be the sound of water flow, the sound of birds, etc. recorded by the user. Optionally, the noise signal may also be white noise, dividend noise, etc., and such noise may be generated by a signal generating device.

In an implementation manner, the second acquiring unit 602 is specifically configured to:

Extracting the amplitude of each audio signal, and obtaining the power of each audio signal according to the amplitude of each audio signal;

The amplitude of each noise signal is extracted, and the power of each noise signal is obtained according to the amplitude of each noise signal.

It should be noted that the functions and implementation of each unit in the audio processing device 600 can refer to the related description in the method embodiment shown in FIG. 2, and will not be repeated this time.

In another embodiment of the present application, a computer non-volatile readable storage medium is provided. The computer non-volatile readable storage medium stores a computer program. The computer program includes program instructions. Realized when executed by the processor.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer non-volatile readable storage medium, or transmitted from one computer non-volatile readable storage medium to another computer non-volatile readable storage medium, for example, the computer instructions It can be from one website site, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) to another website site, Computer, server or data center for transmission. The computer non-volatile readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD), a semiconductor medium (for example, a solid state disk, SSD), etc.

The specific implementations described above further describe the purpose, technical solutions and beneficial effects of the embodiments of this application in further detail. It should be understood that the above descriptions are only specific implementations of the embodiments of this application and are not intended to To limit the protection scope of the embodiments of the application, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the application shall be included in the protection scope of the embodiments of the application.

Claims

An audio processing method, characterized by comprising:

The electronic device acquires N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

Acquiring, by the electronic device, the power of each of the N audio signals and the power of each of the M types of noise signals;

For the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, the electronic device is based on the power of the first audio signal and the first signal-to-noise ratio Calculating the power of the noise signal to be added to the first audio signal;

The electronic device adjusts the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal;

The electronic device mixes the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.
The method according to claim 1, wherein the electronic device calculates the power of the noise signal to be added to the first audio signal according to the power of the first audio signal and the first signal-to-noise ratio, include:

The electronic device calculates the power of the noise signal to be added to the first audio signal according to the Shannon formula, where the Shannon formula is signal-to-noise ratio (dB)=10*log10(A/B)(dB), so The A is the power of the first audio signal, and the B is the power of the noise signal to be added to the first audio signal.
The method according to claim 1 or 2, wherein the M is an integer greater than or equal to 2, and the method further comprises:

Acquiring, by the electronic device, the weights of the M types of noise signals input by the user;

The electronic device adjusting the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal includes:

The electronic device allocates the power of the noise signal required to be added by the first audio signal to each of the M noise signals according to the weight of the M noise signals;

The electronic device adjusts the power of each noise signal according to the allocated power of each noise signal in the M noise signals.
The method according to any one of claims 1 to 3, wherein the electronic device mixes the first audio signal and the M types of noise signals after power adjustment to obtain the first audio signal After the noise signal corresponding to the signal, it also includes:

The electronic device uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system.
The method according to claim 4, wherein before the electronic device uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system, the method further comprises:

The electronic device signs the noise-added signal corresponding to each of the N audio signals to obtain a marked noise-added signal corresponding to each audio signal in the N audio signals. The mark includes one or more of the signal-to-noise ratio of the noise-added signal, the type of noise added by the noise-added signal, and the noise power added by the noise-added signal;

The electronic device uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system, including:

The electronic device trains the music recognition system by using the labeled noise-added signal corresponding to each of the N audio signals.
The method according to any one of claims 1 to 5, wherein the electronic device obtains the power of each of the N audio signals and the power of each of the M types of noise signals ,include:

Extracting the amplitude of each audio signal by the electronic device, and obtain the power of each audio signal according to the amplitude of each audio signal;

The electronic device extracts the amplitude of each noise signal, and obtains the power of each noise signal according to the amplitude of each noise signal.
The method according to any one of claims 1 to 6, wherein the audio signal comprises an audio signal input to the electronic device by the user through a voice input device.
The method according to any one of claims 1 to 7, wherein the noise signal comprises a noise signal input by the user to the electronic device through a voice input device.
An audio processing device, characterized by comprising:

The first acquiring unit is configured to acquire N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

A second acquiring unit, configured to acquire the power of each audio signal in the N audio signals and the power of each noise signal in the M types of noise signals;

The calculation unit is configured to, for the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, according to the power of the first audio signal and the first signal-to-noise ratio Than calculating the power of the noise signal required to be added to the first audio signal;

An adjusting unit, configured to adjust the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal;

The mixing unit is configured to mix the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.
The device according to claim 9, wherein the calculation unit is specifically configured to:

Calculate the power of the noise signal to be added to the first audio signal according to Shannon’s formula, where the Shannon’s formula is signal-to-noise ratio (dB)=10*log10(A/B)(dB), and the A is The power of the first audio signal, and the B is the power of the noise signal to be added to the first audio signal.
The device according to claim 9 or 10, wherein the M is an integer greater than or equal to 2, and the device further comprises:

The third acquiring unit is configured to acquire the weights of the M types of noise signals input by the user;

The adjustment unit includes:

An allocation unit, which allocates the power of the noise signal to be added by the first audio signal to each of the M noise signals according to the weight of the M noise signals;

The noise adjustment unit is configured to adjust the power of each noise signal according to the allocated power of each noise signal in the M noise signals.
The device according to any one of claims 9 to 11, wherein the mixing unit mixes the first audio signal and the M types of noise signals after power adjustment to obtain the first audio signal After the noise signal corresponding to the signal, it also includes:

The training unit is used to train the music recognition system by using the noise-added signal corresponding to each of the N audio signals.
The device according to claim 12, wherein before the training unit uses the noise-added signal corresponding to each of the N audio signals to train the music recognition system, the method further comprises:

The marking unit is used for feature marking the noise-added signal corresponding to each audio signal in the N audio signals to obtain the marked noise-added signal corresponding to each audio signal in the N audio signals. The signature includes the signal-to-noise ratio of the noise-added signal, the type of noise added by the noise-added signal, and the noise power added by the noise-added signal;

The training unit is specifically used for:

The music recognition system is trained by using the marked noise signal corresponding to each of the N audio signals.
The device according to any one of claims 9 to 13, wherein the second acquiring unit comprises:

The first extraction unit is configured to extract the amplitude of each audio signal, and obtain the power of each audio signal according to the amplitude of each audio signal;

The second extraction unit is configured to extract the amplitude of each noise signal, and obtain the power of each noise signal according to the amplitude of each noise signal.
The apparatus according to any one of claims 9 to 14, wherein the audio signal comprises an audio signal input to the audio processing apparatus by the user through a voice input device.
The method according to any one of claims 9 to 15, wherein the noise signal comprises a noise signal input to the audio processing apparatus by the user through a voice input device.
An electronic device, characterized in that it comprises:

One or more processors;

Memory

One or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, and the one or more application programs are configured to execute The following steps:

Acquiring N audio signals, M noise signals, and P signal-to-noise ratios input by the user, where N, M, and P are all positive integers;

Acquiring the power of each audio signal in the N audio signals and the power of each noise signal in the M types of noise signals;

For the first audio signal among the N audio signals and the first signal-to-noise ratio among the P signal-to-noise ratios, the first audio signal is calculated according to the power of the first audio signal and the first signal-to-noise ratio. The power of the noise signal to be added to an audio signal;

Adjusting the power of the M types of noise signals according to the power of the noise signal to be added to the first audio signal;

Signal mixing the first audio signal and the M types of noise signals after power adjustment to obtain a noise-added signal corresponding to the first audio signal.
18. The electronic device of claim 17, wherein the calculating the power of the noise signal to be added to the first audio signal according to the power of the first audio signal and the first signal-to-noise ratio comprises :

Calculate the power of the noise signal to be added to the first audio signal according to Shannon’s formula, where the Shannon’s formula is signal-to-noise ratio (dB)=10*log10(A/B)(dB), and the A is The power of the first audio signal, and the B is the power of the noise signal to be added to the first audio signal.
The electronic device according to claim 17 or 18, wherein the M is an integer greater than or equal to 2, and further comprises:

Acquiring the weights of the M types of noise signals input by the user;

The adjusting the power of the M types of noise signals according to the power of the noise signal to be added by the first audio signal includes:

Allocating the power of the noise signal to be added by the first audio signal to each of the M noise signals according to the weight of the M noise signals;

Adjust the power of each noise signal according to the allocated power of each noise signal in the M noise signals.
A computer non-volatile readable storage medium, wherein the computer non-volatile readable storage medium stores a computer program, and the computer program is executed by a processor to implement any one of claims 1 to 8 The method described.