WO2023000778A9 - Audio signal processing method and related electronic device - Google Patents

Audio signal processing method and related electronic device Download PDF

Info

Publication number
WO2023000778A9
WO2023000778A9 PCT/CN2022/092367 CN2022092367W WO2023000778A9 WO 2023000778 A9 WO2023000778 A9 WO 2023000778A9 CN 2022092367 W CN2022092367 W CN 2022092367W WO 2023000778 A9 WO2023000778 A9 WO 2023000778A9
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
audio
type
frequency domain
signal
Prior art date
Application number
PCT/CN2022/092367
Other languages
French (fr)
Chinese (zh)
Other versions
WO2023000778A1 (en
Inventor
胡贝贝
许剑峰
Original Assignee
北京荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京荣耀终端有限公司 filed Critical 北京荣耀终端有限公司
Publication of WO2023000778A1 publication Critical patent/WO2023000778A1/en
Publication of WO2023000778A9 publication Critical patent/WO2023000778A9/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72442User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality for playing music files
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams

Definitions

  • the present application relates to the field of audio signal processing, in particular to an audio signal processing method and related electronic equipment.
  • the frequency point energy in the frequency domain is concentrated in a narrow bandwidth, for example, similar to piano music, and the frequency domain energy distribution
  • the duration is longer, and subjectively you can hear a noise similar to " ⁇ ", mainly because the excessively concentrated long-term narrow-band energy causes the speaker to produce nonlinear distortion during electro-acoustic conversion.
  • the sound source corresponding to this type of audio signal is called the first class sound source.
  • the traditional method of processing the first type of sound source will cause the problem of falsely suppressing other types of sound sources, such as human voices, resulting in a subjective sense of hearing when this type of sound source is in transition with the first type of sound source. question. Therefore, while reducing the audio signal processing of the first type of sound source and avoiding false suppression of other sound sources, it is a problem concerned by technicians.
  • the embodiment of the present application provides an audio signal processing method, which solves the problem that other types of audio signals are wrongly suppressed during the process of suppressing audio signals that are prone to noise, causing the audio signals to be suppressed and distorted.
  • an embodiment of the present application provides a method for processing an audio signal, including: acquiring an audio signal; when the tone value of the audio signal is greater than or equal to a first threshold and the audio source type of the audio signal is the first type of audio source In the case of , the audio signal is processed using the first type of suppression strategy; otherwise, the audio signal is processed using the second type of suppression strategy.
  • the signal is a signal that is prone to noise (the first type of sound source), and whether the audio signal is based on whether the sound source type of the audio signal is the first type of sound source.
  • the first type of suppression strategy is to suppress a single peak or multiple peaks in the frequency domain of the audio signal.
  • the second type of suppression strategy is to suppress a single peak or multiple peaks in the audio signal; or not to suppress the audio signal.
  • the method includes: performing tonality calculation on the audio signal to obtain a tonality value of the audio signal.
  • the electronic device it is beneficial for the electronic device to adopt different suppression strategies for the audio signal based on the tonality value and the type of the audio source of the audio signal.
  • the audio signal before the audio signal is processed using the first type of suppression strategy, it also includes: performing peak detection on the audio signal, and the peak detection user obtains the audio signal at a frequency Peak information in the domain.
  • the electronic device can acquire the peak value of the audio signal, calculate the difference gain according to the peak value information, and suppress the audio signal according to the difference gain.
  • the sound source type of the audio signal is the first type of sound source and the tonality value of the audio signal is greater than or equal to the first threshold, peak suppression is performed on the audio signal to change the speaker input signal and reduce playback noise.
  • an embodiment of the present application provides an electronic device, which includes: one or more processors and a memory; the memory is coupled to the one or more processors, and the memory is used to store computer program codes,
  • the computer program code includes computer instructions, and the one or more processors invoke the computer instructions to cause the electronic device to perform: acquiring an audio signal; when the tone value of the audio signal is greater than or equal to a first threshold and the sound source of the audio signal If the type is the first type of audio source, the audio signal is processed using the first type of suppression strategy; otherwise, the audio signal is processed using the second type of suppression strategy.
  • the one or more processors are further configured to invoke the computer instruction so that the electronic device executes: performing tonality calculation on the audio signal to obtain the tonality of the audio signal. sexual value.
  • the one or more processors are further configured to call the computer instruction to make the electronic device execute: perform peak detection on the audio signal, and the peak detection user acquires the The peak information of an audio signal in the frequency domain.
  • the one or more processors are further configured to call the computer instruction so that the electronic device executes: calculating the difference between the peak value of the audio signal and the second threshold;
  • the peak value includes at least the maximum peak value of the audio signal in the frequency domain;
  • the difference gain of the peak value is calculated based on the difference value;
  • an embodiment of the present application provides an electronic device, including: a touch screen, a camera, one or more processors, and one or more memories; the one or more processors and the touch screen , the camera, the one or more memories are coupled, the one or more memories are used to store computer program codes, the computer program codes include computer instructions, and when the one or more processors execute the computer instructions , so that the electronic device executes the method described in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a chip system, which is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first Aspect or the method described in any possible implementation manner of the first aspect.
  • the embodiment of the present application provides a computer program product containing instructions.
  • the computer program product is run on an electronic device, the electronic device is made to execute any one of the possible implementations of the first aspect or the first aspect. The method described in the manner.
  • the embodiment of the present application provides a computer-readable storage medium, including instructions, and when the instructions are run on the electronic device, the electronic device executes any one of the possible implementations of the first aspect or the first aspect. The method described in the manner.
  • FIGS. 1A-1C are schematic diagrams of an application scenario provided by the embodiment of the present application.
  • FIG. 2 is a system architecture diagram of an electronic device processing audio signals provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a hardware structure of an electronic device 100 provided by an embodiment of the present application.
  • FIG. 4 is a software structural block diagram of the electronic device 100 provided by the embodiment of the present application.
  • Fig. 5A-Fig. 5D are diagrams of calculation results of tonality value provided by the embodiment of the present application.
  • Fig. 6 is a flow chart of processing an audio signal provided by an embodiment of the present application.
  • FIGS. 7A-7C are diagrams of the audio application startup interface provided by the embodiment of the present application.
  • FIG. 8 is a waveform diagram of a frequency domain signal provided by an embodiment of the present application.
  • a unit may be, but is not limited to being limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or distributed between two or more computers.
  • these units can execute from various computer readable media having various data structures stored thereon.
  • a unit may, for example, be based on a signal having one or more data packets (eg, data from a second unit interacting with another unit between a local system, a distributed system, and/or a network. For example, the Internet via a signal interacting with other systems) Communicate through local and/or remote processes.
  • FIG. 1A when the electronic device 100 detects an input operation (for example, click) on the music application icon 1011 , it will enter the main interface 102 of the music application as shown in FIG. 1B .
  • the electronic device 100 displays a music playing interface 103 as shown in FIG. 1C , and the music application plays music at this time. While the music application is playing music, the electronic device 100 processes the audio signal of the music in real time, so as to ensure that the music played by the music application does not appear noise, thereby bringing a good music experience to the user.
  • FIG. 2 is a system architecture diagram of an electronic device processing audio signals provided by an embodiment of the present application.
  • the system architecture includes an audio application, a mixing thread module, and an audio driver.
  • the audio application may be music player software or video software.
  • an audio application plays audio through a speaker, it processes the audio signal in real time.
  • the audio application sends the audio signal to the audio mixing thread module, and the audio mixing thread module detects whether the audio source type of the audio signal is the first type of audio source. If so, the audio signal is processed (eg, the energy of the audio signal is suppressed). Then, the mixing thread module sends the processed audio signal to the audio driver, and the audio driver sends the processed audio signal to the speaker, and the speaker outputs audio.
  • the electronic device can complete the processing of the audio signal in real time, so that the audio emitted through the speaker has no noise.
  • FIG. 3 is a schematic diagram of a hardware structure of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit, NPU
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • the wireless communication module 160 can provide wireless local area network (wireless local area networks, WLAN) (such as Wi-Fi network), Bluetooth (bluetooth, BT), BLE broadcasting, global navigation satellite system (global navigation satellite system) applied on the electronic device 100. system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to receive the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to realize sound signal collection, noise reduction, sound source identification, and directional recording functions.
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access to application locks, take pictures with fingerprints, answer incoming calls with fingerprints, and the like.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
  • the bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the software structure of the electronic device 100 is exemplarily described by taking an Android system with a layered architecture as an example.
  • FIG. 4 is a block diagram of the software structure of the electronic device 100 provided by the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and audio applications.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 4, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a mixing thread module (Mixer Thread module) and the like.
  • the mixing thread module is used to receive the audio signal sent by the audio application and process the audio signal.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 . For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the size of the speaker is relatively small, and its allowable diaphragm vibration amplitude is relatively small.
  • the membrane vibration amplitude of the speaker will exceed the maximum value, which makes the sound prone to break when played at a high volume, resulting in a sound similar to " ⁇ ".
  • the sound source is usually processed so that the loudspeaker can reduce the noise when the sound is played out.
  • the sound sources are divided into four categories, namely: the first type of sound source, the second type of sound source, the third type of sound source and the fourth type of sound source.
  • the characteristics of the first type of sound source are: the audio signal of this sound source is unevenly distributed on the frequency spectrum, and the energy is concentrated in the middle and low frequencies, the energy is relatively strong, and the duration is long, such as the sound of a piano. When the audio of this type of sound source is played back through the speaker, it is easy to generate noise.
  • the characteristics of the second type of sound source are: the audio signal of the sound source is unevenly distributed on the frequency spectrum, and the energy is mainly concentrated in the middle and low frequencies, but the energy is relatively weak, such as human voice.
  • the characteristic of the third type of sound source is that the audio signal of this sound source is unevenly distributed in the spectrum and has strong energy, but the energy concentration is transient, that is, the energy lasts for a short time, for example, the sound of drums.
  • the characteristic of the fourth type of sound source is that the audio signal of the sound source is evenly distributed on the frequency spectrum.
  • the first type of sound source has uneven energy distribution, large energy and long duration.
  • electronic equipment uses speakers to play back the audio of the first type of sound source, which is more likely to produce noise.
  • the electronic device will suppress the audio signal of the first type of sound source before the speaker plays back the audio, and then send the suppressed audio signal to the speaker, and then Audio is output from the speaker, suppressing noise during audio playback.
  • the method for electronic equipment to process audio signals is: divide the input audio signal into frames, and then perform time-frequency transformation to obtain frequency domain signals, perform tonality calculation on each frame of frequency domain signals, and obtain the tonality value of each frame of frequency domain signals , comparing the tonality value with the set first threshold, and then judging whether the distribution of the audio signal in the frequency domain is uniform. If the tonality value of the frequency domain signal is greater than or equal to the first threshold, it means that the frequency domain signal of the frame is unevenly distributed in the frequency spectrum, indicating that the frequency domain signal of the frame needs to be suppressed, and the frequency domain signal of the frame is suppressed according to the relevant strategy energy, that is, peak suppression.
  • the first threshold may be obtained based on historical data, may also be obtained based on empirical values, and may also be obtained based on experimental data tests, which is not limited in this embodiment of the present application.
  • Fig. 5A is a diagram of the calculation results of the tonality of Yangqin audio. In Fig. 5A, if the first threshold is set to 0.7, the tonality of Yangqin audio generally exceeds the first threshold.
  • the electronic device will judge the dulcimer audio as an audio signal that needs to be suppressed.
  • the yangqin audio signal is unevenly distributed in the frequency domain, due to the strong transient nature of its energy concentration, when the electronic device plays back the yangqin audio through the speaker, it is not easy to generate noise. If the yangqin audio signal is suppressed , which may distort the timbre of the playback dulcimer audio.
  • the tonality calculation results of the audio signal may have huge differences.
  • Both poetry and drum sound belong to the same sound source (both belong to the third type of sound source), and the tonality calculation results of drum sound and drum poem are quite different. distribution in the domain.
  • the tonality judgment of the audio signal may be missed.
  • the first threshold value is 0.7
  • there is a piece of piano (the first type of sound source) audio in the external audio and the tonality calculation result is shown in Figure 5D.
  • the tonality value of the piano audio signal is less than 0.7 between frame 496 and frame 562, so within this time domain, the electronic device will not suppress the piano audio signal, but in other frames, the electronic device The device will suppress the piano audio signal, which will cause the volume of the played piano audio to change between 496 frames and 562 frames, and the volume of the piano sound will change suddenly, giving the user an extremely poor listening experience. Therefore, if only the result of tonality is used as the only factor to determine whether to suppress the audio signal, it is difficult to select the first threshold, and it is easy to miss or misdetect the problem, and then suppress the sound source that does not produce noise or suppress the sound that does not produce noise. sound source.
  • an embodiment of the present application provides a method for processing an audio signal.
  • identifying the sound source type of the audio it is judged whether the sound source is the first type of sound source, and if it is the first type of sound source, the peak value of the audio signal of the first type of sound source is suppressed, and the suppressed audio signal is sent to the speaker for output.
  • FIG. 6 is a flow chart for processing audio signals provided by an embodiment of the present application.
  • the specific process for processing audio signals is as follows:
  • Step S601 start the audio application.
  • the electronic device 100 when the electronic device 100 detects an input operation (for example, click) on the audio application icon 7011, the electronic device 100 displays the startup interface 702 as shown in FIG. In the process of 702, the audio application starts to start.
  • the electronic device displays the main interface 703 of the audio application as shown in FIG. 7C , the start of the audio application is completed.
  • the audio application shown in FIG. 7A-FIG. 7C is a music application, and the audio application may also be a video application, and may also be other applications capable of playing audio. This embodiment of the present application is only for illustration and not limitation.
  • Step S602 the audio application sends an audio signal to the mixing thread module.
  • Step S603 The audio mixing thread module divides the audio signal into frames to obtain M frames of audio signals.
  • the electronic device processes the audio signal in real time, and the speaker then outputs the processed audio signal in the form of audio.
  • the signal is divided into frames, for example, 10ms is a frame.
  • Step S604 the audio mixing thread module performs time-frequency conversion on the audio signal of the nth frame to obtain the frequency domain signal of the audio signal of the frame.
  • the audio mixing thread module obtains the frequency domain signal of the audio signal by performing Fourier Transform (Fourier Transform, FT) or Fast Fourier Transform (FFT) on the audio signal, and the audio mixing thread module also
  • the frequency domain signal can be obtained by performing Mel spectrum transformation on the audio signal
  • the audio mixing thread module can also obtain the frequency domain signal by performing an improved discrete cosine transform (Modified Discrete Cosine Transform, MDCT) on the audio signal.
  • MDCT discrete cosine Transform
  • the embodiment of the present application uses The time-frequency transformation of the audio signal by FFT is taken as an example for description. Before performing FFT, overlapping and windowing can be performed on each frame signal, in order to reduce spectrum leakage during frequency domain transformation and reduce frequency domain processing distortion.
  • the audio mixing thread module performs time-frequency conversion on the audio signal, it can obtain all frequency components of the frequency domain signal of the audio signal, which is convenient for analyzing and calculating different frequencies of the signal.
  • Step S605 The sound mixing thread module performs tonality calculation on the frequency domain signal to obtain the tonality value of the frequency domain signal.
  • the sound mixing thread module sequentially calculates the tonality of the frequency domain signal of the nth frame, and obtains the corresponding tonality value.
  • the purpose of performing tonality calculation on the frequency domain signal by the sound mixing thread module is to determine whether the energy distribution of the frame audio signal in the frequency domain is uniform. If the tonality value of the frequency domain signal is greater than or equal to the first threshold, it is judged that the energy distribution of the frame audio signal in the frequency domain is uneven; The energy distribution of the signal in the frequency domain is uniform.
  • the first threshold may be obtained based on empirical values, may also be obtained based on historical data, and may also be obtained based on experimental data, which is not limited in this embodiment of the present application.
  • the mixing thread module calculates the tonality value as follows:
  • the mixing thread module calculates the flatness Flatness of the frequency domain signal according to the formula (1), and the formula (1) is as follows:
  • N is the length of the FFT transform of the audio signal
  • x(n) is the energy value of the nth frequency point of the frequency domain signal of the frame
  • Flatness is used to represent the energy distribution of the frequency domain signal in the frequency domain, the larger the Flatness is , the more uniform the distribution, the smaller the Flatness, and the more uneven the distribution.
  • the mixing thread module calculates the first parameter SFMdB according to the formula (2), and the formula (2) is as follows:
  • the sound mixing thread module calculates the tonal value ⁇ of the frequency domain signal of the frame according to the formula (3), and the formula (3) is as follows:
  • the value of SFMdBMax may be obtained from historical values, empirical values, or experimental data, which is not limited in this embodiment of the present application.
  • SFMdBMax can be set to -60dB.
  • Step S606 The sound mixing thread module obtains the label of the frequency domain signal based on the neural network.
  • the sound mixing thread module takes the frame frequency domain signal as an input of the neural network, and the neural network outputs a label of the frame frequency domain signal, and the label is used to indicate the audio source type of the frame frequency domain signal.
  • the label includes a first label, a second label, a third label and a fourth label, the first label is used to indicate that the audio source type of the frequency domain signal is the first type of audio source, and the second label is used to indicate the audio source type of the frequency domain signal is the second type of sound source, the third tag is used to indicate that the sound source type of the frequency domain signal is the third type of sound source, and the fourth signal is used to indicate that the sound source type of the frequency domain signal is the fourth type of sound source.
  • the first label is 0, the second label is 1, the third label is 2, and the fourth label is 3 as an example for description.
  • the neural network is a trained neural network.
  • the neural network can be trained offline, and the training process of the neural network is: select a large number of first-class sound sources with a frame length of 10 ms (frequency domain signals with other frame lengths can also be selected, which are not limited in the embodiments of the present application)
  • the frequency domain signal for example, piano sound
  • the frequency domain signal of the second type of sound source for example, human voice
  • the frequency domain signal of the third type of sound source for example, drum sound
  • the frequency domain signal of the fourth type of sound source are used as training sample.
  • the neural network When the frequency domain signal of the first type of sound source is used as the input of the neural network, the neural network will output the label of the frequency domain signal, and compare the label output by the neural network with label 0 to obtain a deviation value Fn1, which is used to represent The degree to which the label output by the neural network differs from label 0.
  • the internal parameters of the neural network are adjusted based on the Fn1, so that the label of the audio signal of the first type of sound source output by the neural network is label 0.
  • train the neural network through other training samples (the frequency domain signal of the second type of sound source, the frequency domain signal of the third type of sound source, and the frequency domain signal of the fourth type of sound source), so that when the neural network receives the input frequency domain signal, The corresponding label can be output.
  • the label of the sample signals can be determined according to the intensity of the sound sources. For example, in a frame of frequency-domain sample signal, if the sound of the piano is obviously louder than the human voice, the sound source of the frequency-domain sample signal is determined as the first type of sound source, and the label is set to 0.
  • Step S607 The sound mixing thread module judges whether the frequency domain signal is the first type of sound source based on the tonality value of the frequency domain signal and the label of the frequency domain signal.
  • step S608 if the judgment is yes, execute step S608; if the judgment is no, execute step S610.
  • the neural network may judge that it is the first type of sound source and output a label of 0. In fact, the sound of a pipa is a third type of sound source.
  • the sound mixing thread module will also judge whether the energy distribution of the frequency domain signal of the frame in the frequency domain is consistent. Uniform, only when the label output by the neural network is 0, and the mixing thread module judges that the energy distribution of the frequency domain signal of the frame is uneven in the frequency domain, the mixing thread module will determine the type of the frequency domain signal of the frame for the first sound source.
  • the mixing thread module judges that the sound source of the frequency domain signal of the frame is the first type of sound source, otherwise, it is not the first type of sound source A type of sound source.
  • Step S608 The sound mixing thread module performs peak detection on the frequency domain signal.
  • the sound mixing thread module detects the peak value of the frequency domain signal of the frame, that is, obtains the amplitudes of the peak and valley of the frequency domain signal of the frame in the frequency domain.
  • Fig. 8 is a waveform diagram of the frequency domain signal of this frame. In the waveform diagram, there are X peaks and Y valleys. The purpose of peak detection is to obtain the amplitudes of these X peaks and Y valleys. From large to small, they are called the largest peak, the second largest peak, the third peak...
  • Step S609 The sound mixing thread module processes the frequency domain signal using the first type of suppression strategy to obtain a processed frequency domain signal.
  • the sound mixing thread module performs single peak suppression or multi-peak suppression on the peak value of the frequency domain signal of the frame. If the single peak suppression is performed on the frequency domain signal of the frame, the maximum peak of the frequency domain signal of the frame is suppressed. If multi-peak suppression is performed on the frequency domain signal of the frame, at least the maximum peak value and the second maximum peak value of the frequency domain signal of the frame frame are suppressed.
  • the specific method for the sound mixing thread module to suppress the peak is: find the peak according to the frequency domain, calculate the difference between the energy of the peak and the second threshold, and calculate the difference gain based on the difference. Multiply the original frequency point by the difference gain to reduce the energy of the corresponding frequency point. For example, if the current maximum peak value is -10dB, and the second threshold is set to -15dB, then the maximum peak value difference is -5dB, and the difference value is converted to a linear value. The gain is about 0.562, then the original frequency point is multiplied by 0.562 to reduce the frequency point energy.
  • the second threshold is a preset maximum peak value, which can be obtained based on empirical values, historical data, or experimental data, and is not limited in this embodiment of the present application.
  • Step S610 The sound mixing thread module processes the frequency domain signal using a second type of suppression strategy to obtain a processed frequency domain signal.
  • the sound mixing thread module adopts the second type of suppression strategy for the frequency domain signal of the frame, and the second type of suppression strategy is: the sound mixing thread module can The peak value of the frame frequency domain signal is suppressed, or the frame frequency domain signal may not be suppressed.
  • the difference between the difference gain of the frequency domain signal of the frame and the difference gain of the frequency domain signal of the previous frame should be within a reasonable range.
  • the audio source type of the n-1th frame frequency domain signal is the first type of audio source, which needs to be suppressed
  • the difference gain is 0.5
  • the nth frame frequency domain signal is human sound (the second type of sound source)
  • the range of the difference gain of the nth frame's audio signal is 0.7-0.8.
  • the difference gain of the frequency domain signal of the nth frame is higher than 0.8, the energy difference between the compressed audio signal of the n-1th frame and the compressed audio signal of the nth frame may be too large, causing the speaker to play back the two frames When audio is turned on, the volume can cause abrupt changes (for example, the sound suddenly becomes louder). If the difference gain of the frequency-domain signal of the nth frame is lower than 0.7, the energy of the signal in the frame may be suppressed excessively, and when the speaker plays back the audio of the frame, the volume of the human voice is very low.
  • Step S611 The sound mixing thread module performs frequency-time transformation on the processed frequency-domain signal to obtain a single-frame audio signal.
  • Step S612 the sound mixing thread module sends the single-frame audio signal to the audio driver.
  • the sound mixing thread module executes step S604.
  • Step S614 The audio driver sends the single-frame audio signal to the speaker.
  • Step S615 The speaker plays the audio corresponding to the single-frame audio signal.
  • the audio processing method provided by the embodiment of the present application combines the neural network and the traditional detection algorithm to identify the sound source type of the audio signal through the neural network, and solve the misjudgment and missed judgment caused by the traditional algorithm and the difficulty of debugging the upper threshold of the tonality value and other issues, and through the implementation of different audio signals without suppressing the gain and implementation time, while maintaining the maximum loudness of the original signal playback, changing the speaker input signal to reduce playback noise and reduce the suppression of different audio signals Distortion.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. Described available medium can be magnetic medium, (for example, floppy disk, hard disk, magnetic tape), optical medium (for example, DVD), or semiconductor medium (for example solid state hard disk Solid State Disk) etc.
  • the processes can be completed by computer programs to instruct related hardware.
  • the programs can be stored in computer-readable storage media.
  • When the programs are executed may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

An audio signal processing method, an electronic device, a computer readable storage medium, and a computer program product. The method comprises: obtaining an audio signal; when the tonality value of the audio signal is greater than or equal to a first threshold and the audio source type of the audio signal is a first-type audio source, processing the audio signal by using a first-type suppression policy; and otherwise, processing the audio signal by using a second-type suppression policy. The method solves the problem that in the process of suppressing an audio signal that easily generates noise, the audio signals generate suppression distortions because other types of audio signals are wrongly suppressed.

Description

一种音频信号的处理方法及相关电子设备An audio signal processing method and related electronic equipment
本申请要求于2021年7月19日提交中国专利局、申请号为202110815051X、发明名称为“一种音频信号的处理方法及相关电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110815051X and the title of the invention "An audio signal processing method and related electronic equipment" filed with the China Patent Office on July 19, 2021, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及音频信号处理领域,尤其涉及一种音频信号的处理方法及相关电子设备。The present application relates to the field of audio signal processing, in particular to an audio signal processing method and related electronic equipment.
背景技术Background technique
在通过小型移动电子设备(如手机、平板电脑)的内置扬声器播放某类特定信号时,即在频域上频点能量集中在较窄带宽内,例如,类似钢琴音乐,且该频域能量分布持续时间较长,主观可以听到类似“呲呲”的杂音,主要是因为过度集中的长时窄带能量使扬声器在电声转换时产生非线性失真,这类音频信号对应的音源称作第一类音源。When a certain type of signal is played through the built-in speakers of small mobile electronic devices (such as mobile phones and tablet computers), the frequency point energy in the frequency domain is concentrated in a narrow bandwidth, for example, similar to piano music, and the frequency domain energy distribution The duration is longer, and subjectively you can hear a noise similar to "呲呲", mainly because the excessively concentrated long-term narrow-band energy causes the speaker to produce nonlinear distortion during electro-acoustic conversion. The sound source corresponding to this type of audio signal is called the first class sound source.
对于减少这类杂音产生,传统方法对于第一类音源处理会引起误压制其他类型音源的问题,例如人声,导致这类音源在与第一类音源过度期间会产生主观听感忽大忽小问题。因此,减少对第一类音源的音频信号处理的同时,避免误压制其它音源,是技术人员关注的问题。To reduce the generation of this kind of noise, the traditional method of processing the first type of sound source will cause the problem of falsely suppressing other types of sound sources, such as human voices, resulting in a subjective sense of hearing when this type of sound source is in transition with the first type of sound source. question. Therefore, while reducing the audio signal processing of the first type of sound source and avoiding false suppression of other sound sources, it is a problem concerned by technicians.
发明内容Contents of the invention
本申请实施例提供了一种音频信号的处理方法,解决了在对易产生杂音的音频信号进行压制的过程中,错误压制其它类型的音频信号,使得音频信号产生压制失真的问题。The embodiment of the present application provides an audio signal processing method, which solves the problem that other types of audio signals are wrongly suppressed during the process of suppressing audio signals that are prone to noise, causing the audio signals to be suppressed and distorted.
第一方面,本申请实施例提供了一种音频信号的处理方法,包括:获取音频信号;在该音频信号的调性值大于或等于第一阈值且该音频信号的音源类型为第一类音源的情况下,对该音频信号采用第一类压制策略处理;否则,对该音频信号采用第二类压制策略进行处理。In a first aspect, an embodiment of the present application provides a method for processing an audio signal, including: acquiring an audio signal; when the tone value of the audio signal is greater than or equal to a first threshold and the audio source type of the audio signal is the first type of audio source In the case of , the audio signal is processed using the first type of suppression strategy; otherwise, the audio signal is processed using the second type of suppression strategy.
在上述实施例中,基于音频信号的音源类型与调性值判断是否该信号为易产生杂音的信号(第一类音源),基于该音频信号的音源类型是否为第一类音源对该音频信号采取不同的压制策略,在保持与原有信号回放最大响度的情况下,改变扬声器输入信号,减少回放杂音以及减少对不同音频信号压制失真。In the above-described embodiments, it is judged based on the sound source type and the tonality value of the audio signal whether the signal is a signal that is prone to noise (the first type of sound source), and whether the audio signal is based on whether the sound source type of the audio signal is the first type of sound source. Adopting different suppression strategies, while maintaining the maximum loudness of the original signal playback, changing the speaker input signal, reducing playback noise and reducing distortion of different audio signals.
结合第一方面,在一种可能实现的方式中,第一类压制策略为对该音频信号频域内的单峰值或多峰值进行压制。With reference to the first aspect, in a possible implementation manner, the first type of suppression strategy is to suppress a single peak or multiple peaks in the frequency domain of the audio signal.
结合第一方面,在一种可能实现的方式中,第二类压制策略为对该音频信号内的单峰值或多峰值进行压制;或者对该音频信号不做压制处理。With reference to the first aspect, in a possible implementation manner, the second type of suppression strategy is to suppress a single peak or multiple peaks in the audio signal; or not to suppress the audio signal.
结合第一方面,在一种可能实现的方式中,获取音频信号之后,包括:对该音频信号进行调性计算,得到该音频信号的调性值。这样,有利于电子设备基于该音频信号的调性值和音源类型对该音频信号采取不同的压制策略。With reference to the first aspect, in a possible implementation manner, after the audio signal is acquired, the method includes: performing tonality calculation on the audio signal to obtain a tonality value of the audio signal. In this way, it is beneficial for the electronic device to adopt different suppression strategies for the audio signal based on the tonality value and the type of the audio source of the audio signal.
结合第一方面,在一种可能实现的方式中,对该音频信号进行调性计算,得到该音频 信号的调性值,包括:根据公式
Figure PCTCN2022092367-appb-000001
计算该音频信号的平坦度;N为该音频信号进行时频变换的长度,x(n)为该音频信号在频域内第n个频点的能量值,Flatnes为该音频信号的平坦度;根据公式SFMdB=10log 10(Flatness)计算该音频信号的第一参数;SFMdB为第一参数;根据公式
Figure PCTCN2022092367-appb-000002
计算该音频信号的调性值;α为该音频信号的调性值,SFMdBMax为第一参数的最大值。这样,有利于电子设备基于该音频信号的调性值和音源类型对该音频信号采取不同的压制策略。
In combination with the first aspect, in a possible implementation manner, the tonality calculation of the audio signal is performed to obtain the tonality value of the audio signal, including: according to the formula
Figure PCTCN2022092367-appb-000001
Calculate the flatness of the audio signal; N is the length of the time-frequency transformation of the audio signal, x(n) is the energy value of the nth frequency point of the audio signal in the frequency domain, and Flatnes is the flatness of the audio signal; Formula SFMdB=10log 10 (Flatness) calculates the first parameter of this audio signal; SFMdB is the first parameter; According to the formula
Figure PCTCN2022092367-appb-000002
Calculate the tonality value of the audio signal; α is the tonality value of the audio signal, and SFMdBMax is the maximum value of the first parameter. In this way, it is beneficial for the electronic device to adopt different suppression strategies for the audio signal based on the tonality value and the type of the audio source of the audio signal.
结合第一方面,在一种可能实现的方式中,对该音频信号采用第一类压制策略进行处理之前,还包括:对该音频信号进行峰值检测,所述峰值检测用户获取该音频信号在频域内的峰值信息。这样,电子设备能够获取该音频信号的峰值,并根据该峰值信息计算差值增益,并根据该差值增益对该音频信号进行压制。With reference to the first aspect, in a possible implementation manner, before the audio signal is processed using the first type of suppression strategy, it also includes: performing peak detection on the audio signal, and the peak detection user obtains the audio signal at a frequency Peak information in the domain. In this way, the electronic device can acquire the peak value of the audio signal, calculate the difference gain according to the peak value information, and suppress the audio signal according to the difference gain.
结合第一方面,在一种可能实现的方式中,对该音频信号采用第一类压制策略进行处理,具体包括:计算该音频信号的峰值与第二阈值的差值;所述峰值至少包括该音频信号在频域内的最大峰值;基于所述差值计算所述峰值的差值增益;根据公式W′=W*f对所述峰值进行压制,f为所述差值增益,W为压制前的峰值,W′为压制后的峰值。这样,在该音频信号的音源类型为第一类音源且该音频信号的调性值大于或等于第一阈值的情况下,对该音频信号进行峰值压制,改变扬声器输入信号,减少了回放杂音。With reference to the first aspect, in a possible implementation manner, the audio signal is processed using the first type of suppression strategy, which specifically includes: calculating the difference between the peak value of the audio signal and the second threshold; the peak value includes at least the The maximum peak value of the audio signal in the frequency domain; calculate the difference gain of the peak value based on the difference value; suppress the peak value according to the formula W'=W*f, f is the difference value gain, and W is before suppression The peak value, W' is the peak value after compression. In this way, when the sound source type of the audio signal is the first type of sound source and the tonality value of the audio signal is greater than or equal to the first threshold, peak suppression is performed on the audio signal to change the speaker input signal and reduce playback noise.
第二方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器和存储器;该存储器与该一个或多个处理器耦合,该存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,该一个或多个处理器调用该计算机指令以使得该电子设备执行:获取音频信号;在该音频信号的调性值大于或等于第一阈值且该音频信号的音源类型为第一类音源的情况下,对该音频信号采用第一类压制策略处理;否则,对该音频信号采用第二类压制策略进行处理。In a second aspect, an embodiment of the present application provides an electronic device, which includes: one or more processors and a memory; the memory is coupled to the one or more processors, and the memory is used to store computer program codes, The computer program code includes computer instructions, and the one or more processors invoke the computer instructions to cause the electronic device to perform: acquiring an audio signal; when the tone value of the audio signal is greater than or equal to a first threshold and the sound source of the audio signal If the type is the first type of audio source, the audio signal is processed using the first type of suppression strategy; otherwise, the audio signal is processed using the second type of suppression strategy.
结合第二方面,在一种可能实现的方式中,该一个或多个处理器还用于调用该计算机指令以使得该电子设备执行:对该音频信号进行调性计算,得到该音频信号的调性值。With reference to the second aspect, in a possible implementation manner, the one or more processors are further configured to invoke the computer instruction so that the electronic device executes: performing tonality calculation on the audio signal to obtain the tonality of the audio signal. sexual value.
结合第二方面,在一种可能实现的方式中,该一个或多个处理器还用于调用该计算机指令以使得该电子设备执行:根据公式
Figure PCTCN2022092367-appb-000003
计算该音频信号的平坦度;N为该音频信号进行时频变换的长度,x(n)为该音频信号在频域内n个频点的能量值,Flatness为该音频信号的平坦度;根据公式SFMdB=10log 10(Flatness)计算该音频信号的第一参数;SFMdB为第一参数;根据公式
Figure PCTCN2022092367-appb-000004
计算该音频信号的调性值;α为该音频信号的调性值,SFMdBMax为第一参数的最大值。
With reference to the second aspect, in a possible implementation manner, the one or more processors are further configured to call the computer instruction so that the electronic device executes: according to the formula
Figure PCTCN2022092367-appb-000003
Calculate the flatness of the audio signal; N is the length of the time-frequency transformation of the audio signal, x(n) is the energy value of n frequency points of the audio signal in the frequency domain, and Flatness is the flatness of the audio signal; according to the formula SFMdB=10log 10 (Flatness) calculates the first parameter of this audio signal; SFMdB is the first parameter; According to the formula
Figure PCTCN2022092367-appb-000004
Calculate the tonality value of the audio signal; α is the tonality value of the audio signal, and SFMdBMax is the maximum value of the first parameter.
结合第二方面,在一种可能实现的方式中,该一个或多个处理器还用于调用该计算机指令以使得该电子设备执行:对该音频信号进行峰值检测,所述峰值检测用户获取该音频 信号在频域内的峰值信息。With reference to the second aspect, in a possible implementation manner, the one or more processors are further configured to call the computer instruction to make the electronic device execute: perform peak detection on the audio signal, and the peak detection user acquires the The peak information of an audio signal in the frequency domain.
结合第二方面,在一种可能实现的方式中,该一个或多个处理器还用于调用该计算机指令以使得该电子设备执行:计算该音频信号的峰值与第二阈值的差值;所述峰值至少包括该音频信号在频域内的最大峰值;基于所述差值计算所述峰值的差值增益;根据公式W′=W*f对所述峰值进行压制,f为所述差值增益,W为压制前的峰值,W′为压制后的峰值。With reference to the second aspect, in a possible implementation manner, the one or more processors are further configured to call the computer instruction so that the electronic device executes: calculating the difference between the peak value of the audio signal and the second threshold; The peak value includes at least the maximum peak value of the audio signal in the frequency domain; the difference gain of the peak value is calculated based on the difference value; the peak value is suppressed according to the formula W'=W*f, and f is the difference value gain , W is the peak value before compression, and W' is the peak value after compression.
第三方面,本申请实施例提供了一种电子设备,包括:触控屏、摄像头、一个或多个处理器和一个或多个存储器;所述一个或多个处理器与所述触控屏、所述摄像头、所述一个或多个存储器耦合,所述一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如第一方面或第一方面的任意一种可能实现的方式所述的方法。In a third aspect, an embodiment of the present application provides an electronic device, including: a touch screen, a camera, one or more processors, and one or more memories; the one or more processors and the touch screen , the camera, the one or more memories are coupled, the one or more memories are used to store computer program codes, the computer program codes include computer instructions, and when the one or more processors execute the computer instructions , so that the electronic device executes the method described in the first aspect or any possible implementation manner of the first aspect.
第四方面,本申请实施例提供了一种芯片系统,该芯片系统应用于电子设备,该芯片系统包括一个或多个处理器,该处理器用于调用计算机指令以使得该电子设备执行如第一方面或第一方面的任意一种可能实现的方式所述的方法。In a fourth aspect, an embodiment of the present application provides a chip system, which is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first Aspect or the method described in any possible implementation manner of the first aspect.
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当该计算机程序产品在电子设备上运行时,使得该电子设备执行如第一方面或第一方面的任意一种可能实现的方式所述的方法。In the fifth aspect, the embodiment of the present application provides a computer program product containing instructions. When the computer program product is run on an electronic device, the electronic device is made to execute any one of the possible implementations of the first aspect or the first aspect. The method described in the manner.
第六方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当该指令在电子设备上运行时,使得该电子设备执行如第一方面或第一方面的任意一种可能实现的方式所述的方法。In the sixth aspect, the embodiment of the present application provides a computer-readable storage medium, including instructions, and when the instructions are run on the electronic device, the electronic device executes any one of the possible implementations of the first aspect or the first aspect. The method described in the manner.
附图说明Description of drawings
图1A-图1C是本申请实施例提供的一种应用场景示意图;1A-1C are schematic diagrams of an application scenario provided by the embodiment of the present application;
图2是本申请实施例提供的一种电子设备处理音频信号的系统架构图;FIG. 2 is a system architecture diagram of an electronic device processing audio signals provided by an embodiment of the present application;
图3是本申请实施例提供的电子设备100的硬件结构示意图;FIG. 3 is a schematic diagram of a hardware structure of an electronic device 100 provided by an embodiment of the present application;
图4是本申请实施例提供的电子设备100的软件结构框图;FIG. 4 is a software structural block diagram of the electronic device 100 provided by the embodiment of the present application;
图5A-图5D是本申请实施例提供的调性值计算结果图;Fig. 5A-Fig. 5D are diagrams of calculation results of tonality value provided by the embodiment of the present application;
图6是本申请实施例提供的一种处理音频信号的流程图;Fig. 6 is a flow chart of processing an audio signal provided by an embodiment of the present application;
图7A-图7C是本申请实施例提供的音频应用启动界面图;7A-7C are diagrams of the audio application startup interface provided by the embodiment of the present application;
图8是本申请实施例提供的一种频域信号的波形图。FIG. 8 is a waveform diagram of a frequency domain signal provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。在本文 中提及“实施例”意味着,结合实施例描述的特定特征、结构或者特性可以包含在本实施例申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是相同的实施例,也不是与其它实施例互斥的独立的或是备选的实施例。本领域技术人员可以显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Apparently, the described embodiments are only some of the embodiments of this application, not all of them. Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of this embodiment application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
本申请的说明书和权利要求书及所述附图中术语“第一”、“第二”、“第三”等是区别于不同的对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如,包含了一系列步骤或单元,或者可选地,还包括没有列出的步骤或单元,或者可选地还包括这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and the like in the specification and claims of the present application and the drawings are used to distinguish different objects, and are not used to describe a specific order. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a series of steps or units are included, or alternatively, steps or units not listed, or other steps or units inherent in the process, method, product or apparatus are also included.
附图中仅示出了与本申请相关的部分而非全部内容。在更加详细地讨论示例性实施例之前,应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。The accompanying drawings only show the part relevant to the present application but not the whole content. Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe various operations (or steps) as sequential processing, many of the operations may be performed in parallel, concurrently, or simultaneously. In addition, the order of operations can be rearranged. The process may be terminated when its operations are complete, but may also have additional steps not included in the figure. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.
在本说明书中使用的术语“部件”、“模块”、“系统”、“单元”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件或执行中的软件。例如,单元可以是但不限于在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或分布在两个或多个计算机之间。此外,这些单元可从在上面存储有各种数据结构的各种计算机可读介质执行。单元可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一单元交互的第二单元数据。例如,通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system", "unit" and the like are used in this specification to denote a computer-related entity, hardware, firmware, a combination of hardware and software, software or software in execution. For example, a unit may be, but is not limited to being limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or distributed between two or more computers. In addition, these units can execute from various computer readable media having various data structures stored thereon. A unit may, for example, be based on a signal having one or more data packets (eg, data from a second unit interacting with another unit between a local system, a distributed system, and/or a network. For example, the Internet via a signal interacting with other systems) Communicate through local and/or remote processes.
下面,结合图1A-图1C对电子设备处理音频信号的应用场景做介绍。In the following, an application scenario in which an electronic device processes an audio signal is introduced with reference to FIGS. 1A-1C .
在图1A中,当电子设备100检测到针对音乐应用图标1011的输入操作(例如,单击),会进入如图1B所示的该音乐应用的主界面102。如图1B所示,当用户在主界面102搜索歌手名或音乐名后,电子设备100显示如图1C所示的音乐播放界面103,此时,该音乐应用播放音乐。在该音乐应用播放音乐的同时,电子设备100实时处理该音乐的音频信号,以确保该音乐应用播放的音乐不会出现杂音,从而给用户带来良好的音乐体验。In FIG. 1A , when the electronic device 100 detects an input operation (for example, click) on the music application icon 1011 , it will enter the main interface 102 of the music application as shown in FIG. 1B . As shown in FIG. 1B , when the user searches for a singer or music name on the main interface 102 , the electronic device 100 displays a music playing interface 103 as shown in FIG. 1C , and the music application plays music at this time. While the music application is playing music, the electronic device 100 processes the audio signal of the music in real time, so as to ensure that the music played by the music application does not appear noise, thereby bringing a good music experience to the user.
上述图1A-图1C介绍了电子设备处理音频信号的应用场景,下面,对电子设备处理音频场景的系统架构图进行介绍。请参见图2,图2是本申请实施例提供的一种电子设备处理音频信号的系统架构图。如图2所述,该系统架构中包括音频应用、混音线程模块以及音频驱动,示例性的,音频应用可以为音乐播放软件或视频软件等应用。The above-mentioned FIG. 1A-FIG. 1C introduce the application scenarios of electronic equipment processing audio signals, and the system architecture diagram of electronic equipment processing audio scenarios is introduced below. Please refer to FIG. 2 . FIG. 2 is a system architecture diagram of an electronic device processing audio signals provided by an embodiment of the present application. As shown in FIG. 2 , the system architecture includes an audio application, a mixing thread module, and an audio driver. Exemplarily, the audio application may be music player software or video software.
音频应用通过扬声器外放音频时,会实时处理音频信号。首先,音频应用会将音频信号发送给混音线程模块,混音线程模块检测该音频信号的音源类型是否为第一类音源。若 是,则对该音频信号进行处理(例如,压制该音频信号的能量)。然后,混音线程模块将处理好的音频信号发送给音频驱动,音频驱动将处理好的音频信号发送给扬声器,扬声器输出音频。这样,电子设备能够实时完成音频信号的处理,使得通过扬声器外放出的音频没有杂音。When an audio application plays audio through a speaker, it processes the audio signal in real time. First, the audio application sends the audio signal to the audio mixing thread module, and the audio mixing thread module detects whether the audio source type of the audio signal is the first type of audio source. If so, the audio signal is processed (eg, the energy of the audio signal is suppressed). Then, the mixing thread module sends the processed audio signal to the audio driver, and the audio driver sends the processed audio signal to the speaker, and the speaker outputs audio. In this way, the electronic device can complete the processing of the audio signal in real time, so that the audio emitted through the speaker has no noise.
下面对电子设备100的结构进行介绍。请参阅图3,图3是本申请实施例提供的电子设备100的硬件结构示意图。The structure of the electronic device 100 will be introduced below. Please refer to FIG. 3 . FIG. 3 is a schematic diagram of a hardware structure of an electronic device 100 provided by an embodiment of the present application.
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that, the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。 Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在 同一个器件中。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves and radiate them through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如Wi-Fi网络),蓝牙(bluetooth,BT),BLE广播,全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide wireless local area network (wireless local area networks, WLAN) (such as Wi-Fi network), Bluetooth (bluetooth, BT), BLE broadcasting, global navigation satellite system (global navigation satellite system) applied on the electronic device 100. system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used for processing the data fed back by the camera 193 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be located in the camera 193 .
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。The NPU is a neural-network (NN) computing processor. By referring to the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process input information and continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。Speaker 170A, also referred to as a "horn", is used to convert audio electrical signals into sound signals. Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。Receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 receives a call or a voice message, the receiver 170B can be placed close to the human ear to receive the voice.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号、降噪、还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to realize sound signal collection, noise reduction, sound source identification, and directional recording functions.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。The pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal. In some embodiments, pressure sensor 180A may be disposed on display screen 194 .
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。The magnetic sensor 180D includes a Hall sensor. The electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access to application locks, take pictures with fingerprints, answer incoming calls with fingerprints, and the like.
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also known as "touch panel". The touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to the touch operation can be provided through the display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。图4是本申请实施例提供的电子设备100的软件结构框图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. In the embodiment of the present invention, the software structure of the electronic device 100 is exemplarily described by taking an Android system with a layered architecture as an example. FIG. 4 is a block diagram of the software structure of the electronic device 100 provided by the embodiment of the present application. The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
应用程序层可以包括一系列应用程序包。如图4所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息、音频应用等应用程 序。The application layer can consist of a series of application packages. As shown in Figure 4, the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and audio applications.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图4所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器、混音线程模块(Mixer Thread模块)等。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 4, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a mixing thread module (Mixer Thread module) and the like.
混音线程模块用于接收音频应用发送的音频信号,并对该音频信号进行处理。The mixing thread module is used to receive the audio signal sent by the audio application and process the audio signal.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make it accessible to applications. Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. The view system can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide communication functions of the electronic device 100 . For example, the management of call status (including connected, hung up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application program layer and the application program framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。A system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
2D图形引擎是2D绘图的绘图引擎。2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
当电子设备通过内置扬声器外放音频时,由于设备的尺寸限制,扬声器的尺寸比较小,其允许的膜振幅度较小。当外放音频的响度过大时,会导致扬声器的膜振幅度超出最大值,从而使得在大音量下播放声音容易出现破音,产生类似“呲呲”的声音。为了解决上述问题,通常对音源进行处理,从而使得扬声器在外放声音时能够减小杂音。When an electronic device emits audio through a built-in speaker, due to the size limitation of the device, the size of the speaker is relatively small, and its allowable diaphragm vibration amplitude is relatively small. When the loudness of the external audio is too loud, the membrane vibration amplitude of the speaker will exceed the maximum value, which makes the sound prone to break when played at a high volume, resulting in a sound similar to "呵呵". In order to solve the above problems, the sound source is usually processed so that the loudspeaker can reduce the noise when the sound is played out.
一般,将音源分为四类,分别为:第一类音源、第二类音源、第三类音源以及第四类音源。第一类音源的特点是:该音源的音频信号在频谱上的分布不均匀,能量集中在中低频,能量较强,且持续时间较长,例如钢琴声。这类音源的音频通过扬声器进行回放时,容易产生杂音。第二类音源的特点是:该音源的音频信号在频谱上的分布不均匀,且能量主要集中在中低频,但能量较弱,例如,人声。第三类音源的特点是,该音源的音频信号在频谱上分布不均匀,且能量较强,但是,能量集中具备瞬态性,即能量持续的时间较短,例如,鼓声。第四类音源的特点是:该音源的音频信号在频谱上的分布均匀。对于上述四类音源,第一类音源由于能量分布不均、能量大且持续时间较长,相较于其它三类音源,电子设备使用扬声器回放第一类音源的音频,更加容易发出杂音。因此,为了解决上述问题,若音频信号中包括第一类音源的音频信号,电子设备会在扬声器回放音频之前,压制第一类音源的音频信号,再将压制后的音频信号发送给扬声器,再由扬声器输出音频,从而抑制回放音频时产生的杂音。Generally, the sound sources are divided into four categories, namely: the first type of sound source, the second type of sound source, the third type of sound source and the fourth type of sound source. The characteristics of the first type of sound source are: the audio signal of this sound source is unevenly distributed on the frequency spectrum, and the energy is concentrated in the middle and low frequencies, the energy is relatively strong, and the duration is long, such as the sound of a piano. When the audio of this type of sound source is played back through the speaker, it is easy to generate noise. The characteristics of the second type of sound source are: the audio signal of the sound source is unevenly distributed on the frequency spectrum, and the energy is mainly concentrated in the middle and low frequencies, but the energy is relatively weak, such as human voice. The characteristic of the third type of sound source is that the audio signal of this sound source is unevenly distributed in the spectrum and has strong energy, but the energy concentration is transient, that is, the energy lasts for a short time, for example, the sound of drums. The characteristic of the fourth type of sound source is that the audio signal of the sound source is evenly distributed on the frequency spectrum. For the above four types of sound sources, the first type of sound source has uneven energy distribution, large energy and long duration. Compared with the other three types of sound sources, electronic equipment uses speakers to play back the audio of the first type of sound source, which is more likely to produce noise. Therefore, in order to solve the above problem, if the audio signal includes the audio signal of the first type of sound source, the electronic device will suppress the audio signal of the first type of sound source before the speaker plays back the audio, and then send the suppressed audio signal to the speaker, and then Audio is output from the speaker, suppressing noise during audio playback.
电子设备处理音频信号的方法为:对输入的音频信号进行分帧,再进行时频变换,得到频域信号,对每帧频域信号进行调性计算,得到每帧频域信号的调性值,将调性值与设定的第一阈值进行比较,进而判断该音频信号在频域内的分布是否均匀。若频域信号的调性值大于或等于第一阈值,则说明该帧频域信号在频谱内的分布不均匀,表明该帧频域信号需要进行压制,并根据相关策略压制该帧频域信号的能量,即进行峰值压制。若频域信号的调性值小于第一阈值,则判断该帧频域信号在频谱内的分布均匀,该帧频域信号不需要压制。其中,第一阈值可以基于历史数据得到,也可以基于经验值得到,还可以基于实验数据测试得到,本申请实施例对此不做限制。The method for electronic equipment to process audio signals is: divide the input audio signal into frames, and then perform time-frequency transformation to obtain frequency domain signals, perform tonality calculation on each frame of frequency domain signals, and obtain the tonality value of each frame of frequency domain signals , comparing the tonality value with the set first threshold, and then judging whether the distribution of the audio signal in the frequency domain is uniform. If the tonality value of the frequency domain signal is greater than or equal to the first threshold, it means that the frequency domain signal of the frame is unevenly distributed in the frequency spectrum, indicating that the frequency domain signal of the frame needs to be suppressed, and the frequency domain signal of the frame is suppressed according to the relevant strategy energy, that is, peak suppression. If the tonality value of the frequency domain signal is smaller than the first threshold, it is determined that the frequency domain signal of the frame is evenly distributed in the frequency spectrum, and the frequency domain signal of the frame does not need to be suppressed. Wherein, the first threshold may be obtained based on historical data, may also be obtained based on empirical values, and may also be obtained based on experimental data tests, which is not limited in this embodiment of the present application.
在上述处理音频信号的方法中,电子设备仅是对音频信号通过调性判断的结果决定音频信号是否需要进行压制。但是,仅通过调性判断音频信号是否应该进行压制是不准确和不全面的,因为,音频外放是否会产生杂音不仅与音频信号的调性相关,还与音频信号频域能量的强弱以及音频信号在频域能量持续的时间长短有关。例如,图5A是扬琴音频的调性计算结果图,在图5A中,若设定第一阈值为0.7,扬琴音频的调性值总体上是超过第一阈值的,根据上述处理音频信号的方法,电子设备会将扬琴音频判断为需要压制的音频信号。但是,实际上,扬琴音频信号虽然在频域上分布不均,但是,由于其能量集中的瞬态性较强,电子设备通过扬声器回放扬琴音频时,不易产生杂音,若将扬琴音频信号进行压制,可能会造成回放的扬琴音频的音色失真。另外,对于同一类的音源,其音频信号的调性计算结果可能有巨大的差异,例如,图5B为鼓声的调性计算结果图,图5C是鼓诗音频的调性计算结果图,鼓诗和鼓声都属于同类音源(都属于第三类音源),鼓声和鼓诗的调性计算结果差异大,很难选取合适的第一阈值来同时表征鼓声音频和鼓诗音频在频域内的分布情况。另外,对音频信号的调性判断可能发生漏检的情况,例如,第一阈值为0.7,在 外放音频中存在一段钢琴(第一类音源)音频,其调性计算结果如图5D所示,在图5D中,在496帧与562帧期间,钢琴音频信号的调性值小于0.7,那么在这段时域范围内,电子设备不会对该钢琴音频信号进行压制,但在其它帧,电子设备会对该钢琴音频信号进行压制,会导致回放的钢琴音频音量在496帧到562帧这段时间内,钢琴声的音量会发生突变,给用户极差的听觉体验。因此,若仅以调性结果作为是否对音频信号进行压制的唯一判断因素,存在选取第一阈值难、容易发生漏检或误检的问题,进而压制没有产生杂音的音源或者不压制产生杂音的音源。In the above method for processing an audio signal, the electronic device only determines whether the audio signal needs to be suppressed based on the result of judging the tone of the audio signal. However, it is inaccurate and incomplete to judge whether the audio signal should be suppressed only by the tonality, because whether the audio output will produce noise is not only related to the tonality of the audio signal, but also related to the strength of the frequency domain energy of the audio signal and The duration of the audio signal's energy in the frequency domain is related to the length of time. For example, Fig. 5A is a diagram of the calculation results of the tonality of Yangqin audio. In Fig. 5A, if the first threshold is set to 0.7, the tonality of Yangqin audio generally exceeds the first threshold. According to the above-mentioned method for processing audio signals , the electronic device will judge the dulcimer audio as an audio signal that needs to be suppressed. However, in fact, although the yangqin audio signal is unevenly distributed in the frequency domain, due to the strong transient nature of its energy concentration, when the electronic device plays back the yangqin audio through the speaker, it is not easy to generate noise. If the yangqin audio signal is suppressed , which may distort the timbre of the playback dulcimer audio. In addition, for the same type of sound source, the tonality calculation results of the audio signal may have huge differences. Both poetry and drum sound belong to the same sound source (both belong to the third type of sound source), and the tonality calculation results of drum sound and drum poem are quite different. distribution in the domain. In addition, the tonality judgment of the audio signal may be missed. For example, the first threshold value is 0.7, and there is a piece of piano (the first type of sound source) audio in the external audio, and the tonality calculation result is shown in Figure 5D. In Figure 5D, the tonality value of the piano audio signal is less than 0.7 between frame 496 and frame 562, so within this time domain, the electronic device will not suppress the piano audio signal, but in other frames, the electronic device The device will suppress the piano audio signal, which will cause the volume of the played piano audio to change between 496 frames and 562 frames, and the volume of the piano sound will change suddenly, giving the user an extremely poor listening experience. Therefore, if only the result of tonality is used as the only factor to determine whether to suppress the audio signal, it is difficult to select the first threshold, and it is easy to miss or misdetect the problem, and then suppress the sound source that does not produce noise or suppress the sound that does not produce noise. sound source.
为了解决上述问题,本申请实施例提供了一种音频信号的处理方法。通过识别音频的音源类型,判断该音源是否为第一类音源,若为第一类音源,则对第一类音源的音频信号进行峰值压制,并将压制后的音频信号发送给扬声器输出。In order to solve the above problem, an embodiment of the present application provides a method for processing an audio signal. By identifying the sound source type of the audio, it is judged whether the sound source is the first type of sound source, and if it is the first type of sound source, the peak value of the audio signal of the first type of sound source is suppressed, and the suppressed audio signal is sent to the speaker for output.
下面,结合图6,对电子设备处理音频信号的具体流程进行说明。请参见图6,图6是本申请实施例提供的一种处理音频信号的流程图,处理音频信号的具体流程为:Next, with reference to FIG. 6 , a specific process of processing an audio signal by an electronic device will be described. Please refer to FIG. 6. FIG. 6 is a flow chart for processing audio signals provided by an embodiment of the present application. The specific process for processing audio signals is as follows:
步骤S601:音频应用启动。Step S601: start the audio application.
示例性的,如图7A所示,当电子设备100检测到针对音频应用图标7011的输入操作(例如,单击)后,电子设备100显示如图7B所示的启动界面702,在显示启动界面702的过程中,音频应用开始启动。当电子设备显示如图7C所示的音频应用的主界面703时,音频应用启动完成。其中,图7A-图7C所示的音频应用为音乐应用,音频应用也可以为视频应用,还可以为其它能够播放音频的应用,本申请实施例仅作举例说明,不做限制。Exemplarily, as shown in FIG. 7A, when the electronic device 100 detects an input operation (for example, click) on the audio application icon 7011, the electronic device 100 displays the startup interface 702 as shown in FIG. In the process of 702, the audio application starts to start. When the electronic device displays the main interface 703 of the audio application as shown in FIG. 7C , the start of the audio application is completed. Wherein, the audio application shown in FIG. 7A-FIG. 7C is a music application, and the audio application may also be a video application, and may also be other applications capable of playing audio. This embodiment of the present application is only for illustration and not limitation.
步骤S602:音频应用向混音线程模块发送音频信号。Step S602: the audio application sends an audio signal to the mixing thread module.
步骤S603:混音线程模块将所述音频信号进行分帧处理,得到M帧音频信号。Step S603: The audio mixing thread module divides the audio signal into frames to obtain M frames of audio signals.
具体地,电子设备是实时处理音频信号,扬声器再将处理好的音频信号以音频的形式输出。考虑信号的短时平稳性以及回放的实时性,即回放时候不希望引入太大延时,所以对信号进行分帧处理,例如,10ms为一帧。Specifically, the electronic device processes the audio signal in real time, and the speaker then outputs the processed audio signal in the form of audio. Consider the short-term stability of the signal and the real-time performance of playback, that is, it is not expected to introduce too much delay during playback, so the signal is divided into frames, for example, 10ms is a frame.
步骤S604:混音线程模块将第n帧音频信号进行时频变换,得到该帧音频信号的频域信号。Step S604: the audio mixing thread module performs time-frequency conversion on the audio signal of the nth frame to obtain the frequency domain signal of the audio signal of the frame.
具体地,混音线程模块以通过对音频信号进行傅里叶变换(Fourier Transform,FT)或快速傅里叶变换(Fast Fourier Transform,FFT),得到音频信号的频域信号,混音线程模块也可以通过对音频信号进行梅尔谱变换,得到频域信号,混音线程模块还可以通过对音频信号进行改进离散余弦变换(Modified Discrete Cosine Transform,MDCT),得到频域信号,本申请实施例以通过FFT对音频信号进行时频变换为例,进行说明。在进行FFT之前,对于每帧信号可以进行交叠、加窗,目的是为了减少频域变换时的频谱泄露,减少频域处理失真。混音线程模块在将音频信号进行时频变换后,就可以得到该音频信号的频域信号的所有组成的频率成分,便于对信号的不同频率进行分析计算。Specifically, the audio mixing thread module obtains the frequency domain signal of the audio signal by performing Fourier Transform (Fourier Transform, FT) or Fast Fourier Transform (FFT) on the audio signal, and the audio mixing thread module also The frequency domain signal can be obtained by performing Mel spectrum transformation on the audio signal, and the audio mixing thread module can also obtain the frequency domain signal by performing an improved discrete cosine transform (Modified Discrete Cosine Transform, MDCT) on the audio signal. The embodiment of the present application uses The time-frequency transformation of the audio signal by FFT is taken as an example for description. Before performing FFT, overlapping and windowing can be performed on each frame signal, in order to reduce spectrum leakage during frequency domain transformation and reduce frequency domain processing distortion. After the audio mixing thread module performs time-frequency conversion on the audio signal, it can obtain all frequency components of the frequency domain signal of the audio signal, which is convenient for analyzing and calculating different frequencies of the signal.
步骤S605:混音线程模块对所述频域信号进行调性计算,得到所述频域信号的调性值。Step S605: The sound mixing thread module performs tonality calculation on the frequency domain signal to obtain the tonality value of the frequency domain signal.
具体地,混音线程模块依次对第n帧频域信号进行调性计算,并得到相应的调性值。混音线程模块对该频域信号进行调性计算的目的是为了判断该帧音频信号在频域内的能量分布是否均匀。若该频域信号的调性值大于或等于第一阈值,则判断该帧音频信号在频域 内的能量分布不均匀,若该频域信号的调性值小于第一阈值,则判断该帧音频信号在频域内的能量分布均匀。其中,第一阈值可以是基于经验值得到,也可以是基于历史数据得到,还可以使基于实验数据得到,本申请实施例对此不做限制。混音线程模块计算调性值的方法为:Specifically, the sound mixing thread module sequentially calculates the tonality of the frequency domain signal of the nth frame, and obtains the corresponding tonality value. The purpose of performing tonality calculation on the frequency domain signal by the sound mixing thread module is to determine whether the energy distribution of the frame audio signal in the frequency domain is uniform. If the tonality value of the frequency domain signal is greater than or equal to the first threshold, it is judged that the energy distribution of the frame audio signal in the frequency domain is uneven; The energy distribution of the signal in the frequency domain is uniform. Wherein, the first threshold may be obtained based on empirical values, may also be obtained based on historical data, and may also be obtained based on experimental data, which is not limited in this embodiment of the present application. The mixing thread module calculates the tonality value as follows:
混音线程模块根据公式(1)计算频域信号的平坦度Flatness,公式(1)如下所示:The mixing thread module calculates the flatness Flatness of the frequency domain signal according to the formula (1), and the formula (1) is as follows:
Figure PCTCN2022092367-appb-000005
Figure PCTCN2022092367-appb-000005
其中,N为将音频信号进行FFT变换的长度,x(n)为该帧频域信号第n个频点的能量值,Flatness用于表示频域信号在频域内的能量分布情况,Flatness越大,分布越均匀,Flatness越小,分布越不均匀。然后,混音线程模块根据公式(2),计算第一参数SFMdB,公式(2)如下所示:Among them, N is the length of the FFT transform of the audio signal, x(n) is the energy value of the nth frequency point of the frequency domain signal of the frame, Flatness is used to represent the energy distribution of the frequency domain signal in the frequency domain, the larger the Flatness is , the more uniform the distribution, the smaller the Flatness, and the more uneven the distribution. Then, the mixing thread module calculates the first parameter SFMdB according to the formula (2), and the formula (2) is as follows:
SFMdB=10log 10(Flatness)   (2)SFMdB=10log 10(Flatness) (2)
然后,混音线程模块根据公式(3)计算该帧频域信号的调性值α,公式(3)如下所示:Then, the sound mixing thread module calculates the tonal value α of the frequency domain signal of the frame according to the formula (3), and the formula (3) is as follows:
Figure PCTCN2022092367-appb-000006
Figure PCTCN2022092367-appb-000006
其中,SFMdBMax的取值可以由历史值得到,也可以由经验值得到,还可以由实验数据得到,本申请实施例对此不做限制。优选地,SFMdBMax可以设置为-60dB。Wherein, the value of SFMdBMax may be obtained from historical values, empirical values, or experimental data, which is not limited in this embodiment of the present application. Preferably, SFMdBMax can be set to -60dB.
步骤S606:混音线程模块基于神经网络获取所述频域信号的标签。Step S606: The sound mixing thread module obtains the label of the frequency domain signal based on the neural network.
具体地,混音线程模块将所述该帧频域信号作为神经网络的输入,神经网络输出该帧频域信号的标签,该标签用于指示该帧频域信号的音源类型。所述标签包括第一标签、第二标签、第三标签和第四标签,第一标签用于指示频域信号的音源类型为第一类音源,第二标签用于指示频域信号的音源类型为第二类音源,第三标签用于指示频域信号的音源类型为第三类音源,第四信号用于指示频域信号的音源类型为第四类音源。本申请实施例以第一标签为0,第二标签为1,第三标签为2,第四标签为3为例,进行说明。其中,该神经网络是已训练好的神经网络。Specifically, the sound mixing thread module takes the frame frequency domain signal as an input of the neural network, and the neural network outputs a label of the frame frequency domain signal, and the label is used to indicate the audio source type of the frame frequency domain signal. The label includes a first label, a second label, a third label and a fourth label, the first label is used to indicate that the audio source type of the frequency domain signal is the first type of audio source, and the second label is used to indicate the audio source type of the frequency domain signal is the second type of sound source, the third tag is used to indicate that the sound source type of the frequency domain signal is the third type of sound source, and the fourth signal is used to indicate that the sound source type of the frequency domain signal is the fourth type of sound source. In the embodiment of the present application, the first label is 0, the second label is 1, the third label is 2, and the fourth label is 3 as an example for description. Wherein, the neural network is a trained neural network.
示例性的,神经网络可以进行离线训练,神经网络的训练过程为:选取大量帧长为10ms(也可以选取其它帧长的频域信号,本申请实施例不做限制)的第一类音源的频域信号(例如,钢琴声)、第二类音源的频域信号(例如,人声)、第三类音源的频域信号(例如,鼓声)以及第四类音源的频域信号作为训练样本。当将第一类音源的频域信号作为神经网络的输入后,神经网络会输出该频域信号的标签,将神经网络输出的标签与标签0进行对比,得到一个偏差值Fn1,Fn1用于表征神经网络输出的标签与标签0的差异程度。Exemplarily, the neural network can be trained offline, and the training process of the neural network is: select a large number of first-class sound sources with a frame length of 10 ms (frequency domain signals with other frame lengths can also be selected, which are not limited in the embodiments of the present application) The frequency domain signal (for example, piano sound), the frequency domain signal of the second type of sound source (for example, human voice), the frequency domain signal of the third type of sound source (for example, drum sound) and the frequency domain signal of the fourth type of sound source are used as training sample. When the frequency domain signal of the first type of sound source is used as the input of the neural network, the neural network will output the label of the frequency domain signal, and compare the label output by the neural network with label 0 to obtain a deviation value Fn1, which is used to represent The degree to which the label output by the neural network differs from label 0.
然后,基于所述Fn1调节神经网络内部的参数,从而使得神经网络输出的第一类音源的音频信号的标签为标签0。同理,通过其它训练样本(第二类音源的频域信号、第三类音源的频域信号以及第四类音源的频域信号)训练神经网络,使得神经网络接收输入的频域信号时,可以输出对应的标签。Then, the internal parameters of the neural network are adjusted based on the Fn1, so that the label of the audio signal of the first type of sound source output by the neural network is label 0. Similarly, train the neural network through other training samples (the frequency domain signal of the second type of sound source, the frequency domain signal of the third type of sound source, and the frequency domain signal of the fourth type of sound source), so that when the neural network receives the input frequency domain signal, The corresponding label can be output.
需要说明的,在训练样本中,一帧频域信号可能存在多种类型的音源。例如,在一首音乐中,歌手在不停地唱歌,歌曲的伴奏为钢琴声,那么,在这首音乐中,存在钢琴(第 一类音源)和人声(第二类音源)这两类音源。这时,如果在一帧频域样本信号中存在多类音源,可以根据音源的强度来确定该样本信号的标签。例如,在一帧频域样本信号中,若钢琴声明显大于人声,将该频域样本信号的音源确定为第一类音源,设置标签为0。It should be noted that in the training samples, there may be multiple types of sound sources in a frame of frequency domain signals. For example, in a piece of music, the singer is singing non-stop, and the accompaniment of the song is the sound of the piano, so, in this piece of music, there are two types of sounds: the piano (the first type of sound source) and the human voice (the second type of sound source). sound source. At this time, if there are multiple types of sound sources in a frame of frequency-domain sample signals, the label of the sample signals can be determined according to the intensity of the sound sources. For example, in a frame of frequency-domain sample signal, if the sound of the piano is obviously louder than the human voice, the sound source of the frequency-domain sample signal is determined as the first type of sound source, and the label is set to 0.
步骤S607:混音线程模块基于所述频域信号的调性值和所述频域信号的标签判断所述频域信号是否为第一类音源。Step S607: The sound mixing thread module judges whether the frequency domain signal is the first type of sound source based on the tonality value of the frequency domain signal and the label of the frequency domain signal.
具体地,若判断为是,执行步骤S608,若判断为否,执行步骤S610。Specifically, if the judgment is yes, execute step S608; if the judgment is no, execute step S610.
由于神经网络的训练样本有限,且音源的种类很多,例如包括钢琴声、扬琴声、口琴声、琵琶声等,当输入神经网络未训练过音源的频域信号时,神经网络输出的标签的准确性不高。例如,当输入琵琶声时,神经网络可能判断其为第一类音源,输出标签0,实际上,琵琶声为第三类音源。为了解决上述问题,在神经网络判断该帧频域信号的音源类型为第一音源(输出标签0)后,混音线程模块同时也会判断该帧频域信号在频域上的能量分布是否不均匀,只有在神经网络输出的标签为0,且混音线程模块判断该帧频域信号在频域上的能量分布不均匀的情况下,混音线程模块才会确定该帧频域信号的类型为第一音源。因此,若该帧频域信号的调性值大于或等于第一阈值且神经网络输出标签为0时,混音线程模块判断该帧频域信号的音源为第一类音源,反之,不为第一类音源。Due to the limited training samples of the neural network, and there are many types of sound sources, such as piano sound, dulcimer sound, harmonica sound, pipa sound, etc., when the frequency domain signal of the sound source that has not been trained by the neural network is input, the accuracy of the label output by the neural network is Sex is not high. For example, when the sound of a pipa is input, the neural network may judge that it is the first type of sound source and output a label of 0. In fact, the sound of a pipa is a third type of sound source. In order to solve the above problem, after the neural network judges that the sound source type of the frequency domain signal of the frame is the first sound source (output label 0), the sound mixing thread module will also judge whether the energy distribution of the frequency domain signal of the frame in the frequency domain is consistent. Uniform, only when the label output by the neural network is 0, and the mixing thread module judges that the energy distribution of the frequency domain signal of the frame is uneven in the frequency domain, the mixing thread module will determine the type of the frequency domain signal of the frame for the first sound source. Therefore, if the tonality value of the frequency domain signal of the frame is greater than or equal to the first threshold and the neural network output label is 0, the mixing thread module judges that the sound source of the frequency domain signal of the frame is the first type of sound source, otherwise, it is not the first type of sound source A type of sound source.
步骤S608:混音线程模块对所述频域信号进行峰值检测。Step S608: The sound mixing thread module performs peak detection on the frequency domain signal.
具体地,混音线程模块对该帧频域信号峰值检测,即获取该帧频域信号在频域内的波峰和波谷的幅值。例如,图8是该帧频域信号的波形图,在该波形图中,包括X个波峰和Y个波谷,峰值检测的目的就是为了获取这X个波峰和Y个波谷的幅值,幅值从大到小依次称作最大峰值、次大峰值、第三峰值……。Specifically, the sound mixing thread module detects the peak value of the frequency domain signal of the frame, that is, obtains the amplitudes of the peak and valley of the frequency domain signal of the frame in the frequency domain. For example, Fig. 8 is a waveform diagram of the frequency domain signal of this frame. In the waveform diagram, there are X peaks and Y valleys. The purpose of peak detection is to obtain the amplitudes of these X peaks and Y valleys. From large to small, they are called the largest peak, the second largest peak, the third peak...
混音线程模块对频域信号进行峰值检测的一种方法为:根据信号的频域能量分布,通过对其求导数得到极值的方法。例如:假设时域第n帧信号为x(n),FFT长度为N,其对应频域信号频点能量为X(k),k=0,1,2…N-1。各频点累积能量为
Figure PCTCN2022092367-appb-000007
m=0,1,2…N-1,频点总能量为Y,那么在设定寻找峰值的频点范围m内,能量比值为R[m]=E[m]/Y,m=0,1,2…N-1,然后对能量比值进行求导得到R[m] *,寻找R[m] *中的最大值以及次大值即表示最大峰以及次大峰值所在频点位置。
One method for the mixing thread module to detect the peak value of the frequency domain signal is: according to the frequency domain energy distribution of the signal, the extreme value is obtained by calculating its derivative. For example: assume that the nth frame signal in the time domain is x(n), the FFT length is N, and the frequency point energy of the corresponding frequency domain signal is X(k), k=0,1,2...N-1. The cumulative energy of each frequency point is
Figure PCTCN2022092367-appb-000007
m=0,1,2...N-1, the total energy of the frequency point is Y, then within the frequency range m of setting the peak value, the energy ratio is R[m]=E[m]/Y, m=0 , 1, 2...N-1, and then deriving the energy ratio to get R[m] * , looking for the maximum value and second largest value in R[m] * means the frequency point where the largest peak and the second largest peak are located.
步骤S609:混音线程模块对所述频域信号采用第一类压制策略进行处理,得到处理后的频域信号。Step S609: The sound mixing thread module processes the frequency domain signal using the first type of suppression strategy to obtain a processed frequency domain signal.
具体地,所述第一类压制策略为:混音线程模块对该帧频域信号的峰值进行单峰值压制或多峰值压制。若对该帧频域信号进行单峰值压制,则对该帧频域信号的最大峰进行压制。若对该帧频域信号进行多峰值压制,则至少对该帧频域信号的最大峰值和次大峰值进行压制。Specifically, the first type of suppression strategy is: the sound mixing thread module performs single peak suppression or multi-peak suppression on the peak value of the frequency domain signal of the frame. If the single peak suppression is performed on the frequency domain signal of the frame, the maximum peak of the frequency domain signal of the frame is suppressed. If multi-peak suppression is performed on the frequency domain signal of the frame, at least the maximum peak value and the second maximum peak value of the frequency domain signal of the frame frame are suppressed.
混音线程模块压制峰值的具体方法为:根据频域寻找到峰值,计算该峰值的能量与第二阈值的差值,基于所述差值计算差值增益。原始频点乘以差值增益从而减少对应频点能量,例如,检测当前最大峰值是-10dB,第二阈值设定为-15dB,那么最大峰值差值为-5dB,转换到线性值其差值增益约为0.562,那么原始频点乘以0.562达到减少频点能量的目的。需要说明的是,第二阈值是预设的最大峰值,可以基于经验值得到,也可以基于历史数据得到,还可以基于实验数据得到,本申请实施例不做任何限制。The specific method for the sound mixing thread module to suppress the peak is: find the peak according to the frequency domain, calculate the difference between the energy of the peak and the second threshold, and calculate the difference gain based on the difference. Multiply the original frequency point by the difference gain to reduce the energy of the corresponding frequency point. For example, if the current maximum peak value is -10dB, and the second threshold is set to -15dB, then the maximum peak value difference is -5dB, and the difference value is converted to a linear value. The gain is about 0.562, then the original frequency point is multiplied by 0.562 to reduce the frequency point energy. It should be noted that the second threshold is a preset maximum peak value, which can be obtained based on empirical values, historical data, or experimental data, and is not limited in this embodiment of the present application.
步骤S610:混音线程模块对所述频域信号采用第二类压制策略进行处理,得到处理后的频域信号。Step S610: The sound mixing thread module processes the frequency domain signal using a second type of suppression strategy to obtain a processed frequency domain signal.
具体地,当该帧频域信号的音源类型不为第一类音源时,混音线程模块对该帧频域信号采取第二类压制策略,第二类压制策略为:混音线程模块可以对该帧频域信号的峰值进行压制,也可以不对该帧频域信号进行压制。Specifically, when the sound source type of the frequency domain signal of the frame is not the first type of sound source, the sound mixing thread module adopts the second type of suppression strategy for the frequency domain signal of the frame, and the second type of suppression strategy is: the sound mixing thread module can The peak value of the frame frequency domain signal is suppressed, or the frame frequency domain signal may not be suppressed.
当对该帧频域信号进行压制之前,需要对该帧频域信号进行峰值检测,在对该帧频域信号进行峰值压制需要考虑到相邻帧音频信号具有很强的相关性。因此,该帧频域信号的差值增益与上一帧频域信号的差值增益的差值要在合理地范围之内。例如,若差值增益的差值范围为0.2~0.3,第n-1帧频域信号的音源类型为第一类音源,需要进行压制,其差值增益为0.5,第n帧频域信号为人声(第二类音源),若要对第n帧频域信号进行压制,则第n帧音频信号的差值增益的范围在0.7~0.8。若第n帧频域信号的差值增益高于0.8,可能会造成压制后的第n-1帧音频信号和压制后的第n帧音频信号的能量相差过大,使得扬声器在回放这两帧音频时,音量会造成突变(例如,声音突然变大)。若第n帧频域信号的差值增益低于0.7,可能导致该帧信号的能量被过度压制,当扬声器回放该帧音频时,人声的音量非常小。Before suppressing the frequency domain signal of the frame, it is necessary to perform peak detection on the frequency domain signal of the frame, and the peak suppression of the frequency domain signal of the frame needs to consider that the audio signals of adjacent frames have a strong correlation. Therefore, the difference between the difference gain of the frequency domain signal of the frame and the difference gain of the frequency domain signal of the previous frame should be within a reasonable range. For example, if the difference range of the difference gain is 0.2 to 0.3, the audio source type of the n-1th frame frequency domain signal is the first type of audio source, which needs to be suppressed, the difference gain is 0.5, and the nth frame frequency domain signal is human sound (the second type of sound source), if the frequency-domain signal of the nth frame is to be suppressed, the range of the difference gain of the nth frame's audio signal is 0.7-0.8. If the difference gain of the frequency domain signal of the nth frame is higher than 0.8, the energy difference between the compressed audio signal of the n-1th frame and the compressed audio signal of the nth frame may be too large, causing the speaker to play back the two frames When audio is turned on, the volume can cause abrupt changes (for example, the sound suddenly becomes louder). If the difference gain of the frequency-domain signal of the nth frame is lower than 0.7, the energy of the signal in the frame may be suppressed excessively, and when the speaker plays back the audio of the frame, the volume of the human voice is very low.
步骤S611:混音线程模块对处理后的频域信号进行频时变换,得到单帧音频信号。Step S611: The sound mixing thread module performs frequency-time transformation on the processed frequency-domain signal to obtain a single-frame audio signal.
步骤S612:混音线程模块将所述单帧音频信号发送给音频驱动。Step S612: the sound mixing thread module sends the single-frame audio signal to the audio driver.
步骤S613:混音线程模块根据公式n=n+1更新n。Step S613: The sound mixing thread module updates n according to the formula n=n+1.
具体地,在n不等于0的情况下,混音线程模块执行步骤S604。Specifically, in the case that n is not equal to 0, the sound mixing thread module executes step S604.
步骤S614:音频驱动将所述单帧音频信号发送给扬声器。Step S614: The audio driver sends the single-frame audio signal to the speaker.
步骤S615:扬声器播放所述单帧音频信号对应的音频。Step S615: The speaker plays the audio corresponding to the single-frame audio signal.
本申请实施例提供的音频处理方法结合了神经网络与传统检测算法,通过神经网络对音频信号的音源类型进行识别,解决传统算法带来的误判以及漏判和调性值的上限阈值难调试等问题,并且通过对不同音频信号实施不用压制增益和实施时间,在保持与原有信号回放最大响度的情况下,改变扬声器输入信号减少回放杂音以及减少对不同音频信号压制失真。The audio processing method provided by the embodiment of the present application combines the neural network and the traditional detection algorithm to identify the sound source type of the audio signal through the neural network, and solve the misjudgment and missed judgment caused by the traditional algorithm and the difficulty of debugging the upper threshold of the tonality value and other issues, and through the implementation of different audio signals without suppressing the gain and implementation time, while maintaining the maximum loudness of the original signal playback, changing the speaker input signal to reduce playback noise and reduce the suppression of different audio signals Distortion.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例 如固态硬盘Solid State Disk)等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. Described available medium can be magnetic medium, (for example, floppy disk, hard disk, magnetic tape), optical medium (for example, DVD), or semiconductor medium (for example solid state hard disk Solid State Disk) etc.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments are realized. The processes can be completed by computer programs to instruct related hardware. The programs can be stored in computer-readable storage media. When the programs are executed , may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.
总之,以上所述仅为本发明技术方案的实施例而已,并非用于限定本发明的保护范围。凡根据本发明的揭露,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。In a word, the above description is only an embodiment of the technical solution of the present invention, and is not intended to limit the protection scope of the present invention. All modifications, equivalent replacements, improvements, etc. made according to the disclosure of the present invention shall be included in the protection scope of the present invention.

Claims (10)

  1. 一种音频信号的处理方法,其特征在于,包括:A method for processing an audio signal, comprising:
    获取音频信号;get the audio signal;
    在所述音频信号的调性值大于或等于第一阈值且所述音频信号的音源类型为第一类音源的情况下,对所述音频信号采用第一类压制策略处理;When the tonality value of the audio signal is greater than or equal to a first threshold and the sound source type of the audio signal is the first type of sound source, the first type of suppression strategy is used to process the audio signal;
    否则,对所述音频信号采用第二类压制策略进行处理。Otherwise, the audio signal is processed using the second type of suppression strategy.
  2. 如权利要求1所述的方法,其特征在于,所述第一类压制策略为对所述音频信号频域内的单峰值或多峰值进行压制。The method according to claim 1, wherein the first type of suppression strategy is to suppress a single peak or multiple peaks in the frequency domain of the audio signal.
  3. 如权利要求1-2任一项所述的方法,其特征在于,所述第二类压制策略为对所述音频信号内的单峰值或多峰值进行压制;或者The method according to any one of claims 1-2, wherein the second type of suppression strategy is to suppress a single peak or multiple peaks in the audio signal; or
    对所述音频信号不做压制处理。No compression processing is performed on the audio signal.
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述获取音频信号之后,包括:The method according to any one of claims 1-3, wherein after said acquiring the audio signal, comprising:
    对所述音频信号进行调性计算,得到所述音频信号的调性值。Perform tonality calculation on the audio signal to obtain a tonality value of the audio signal.
  5. 如权利要求4所述的方法,其特征在于,所述对所述音频信号进行调性计算,得到所述音频信号的调性值,包括:The method according to claim 4, wherein said performing tonality calculation on said audio signal to obtain the tonality value of said audio signal comprises:
    根据公式
    Figure PCTCN2022092367-appb-100001
    计算所述音频信号的平坦度;所述N为所述音频信号进行时频变换的长度,所述x(n)为所述音频信号在频域内第n个频点的能量值,所述Flatness为所述音频信号的平坦度;
    According to the formula
    Figure PCTCN2022092367-appb-100001
    Calculate the flatness of the audio signal; the N is the length of the time-frequency transformation of the audio signal, the x(n) is the energy value of the nth frequency point of the audio signal in the frequency domain, and the Flatness is the flatness of the audio signal;
    根据公式SFMdB=10log 10(Flatness)计算所述音频信号的第一参数;所述SFMdB为所述第一参数;According to formula SFMdB=10log 10 (Flatness) calculates the first parameter of described audio signal; Described SFMdB is described first parameter;
    根据公式
    Figure PCTCN2022092367-appb-100002
    计算所述音频信号的调性值;所述α为所述音频信号的调性值,所述SFMdBMax为第一参数的最大值。
    According to the formula
    Figure PCTCN2022092367-appb-100002
    Calculate the tonality value of the audio signal; the α is the tonality value of the audio signal, and the SFMdBMax is the maximum value of the first parameter.
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述对所述音频信号采用第一类压制策略进行处理之前,还包括:The method according to any one of claims 1-5, wherein, before processing the audio signal using the first type of suppression strategy, further comprising:
    对所述音频信号进行峰值检测,所述峰值检测用户获取所述音频信号在频域内的峰值信息。Peak detection is performed on the audio signal, and the peak detection user obtains peak information of the audio signal in the frequency domain.
  7. 如权利要求6所述的方法,其特征在于,所述对所述音频信号采用第一类压制策略进行处理,具体包括:The method according to claim 6, wherein the processing of the audio signal using a first-type suppression strategy specifically includes:
    计算所述音频信号的峰值与第二阈值的差值;所述峰值至少包括所述音频信号在频域内的最大峰值;calculating the difference between the peak value of the audio signal and the second threshold; the peak value includes at least the maximum peak value of the audio signal in the frequency domain;
    基于所述差值计算所述峰值的差值增益;calculating a difference gain for the peak value based on the difference;
    根据公式W′=W*f对所述峰值进行压制,所述f为所述差值增益,所述W为压制前的峰值,所述W′为压制后的峰值。The peak value is suppressed according to the formula W'=W*f, the f is the difference gain, the W is the peak value before suppression, and the W' is the peak value after suppression.
  8. 一种电子设备,其特征在于,包括:存储器、处理器和触控屏;其中:An electronic device, characterized in that it includes: a memory, a processor, and a touch screen; wherein:
    所述触控屏用于显示内容;The touch screen is used to display content;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;The memory is used to store a computer program, and the computer program includes program instructions;
    所述处理器用于调用所述程序指令,使得所述终端执行如权利要求1-7任一项所述的方法。The processor is configured to call the program instruction, so that the terminal executes the method according to any one of claims 1-7.
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,实现如权利要求1-7任意一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1-7 is implemented.
  10. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-7中任意一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product is run on an electronic device, the electronic device is made to execute the method according to any one of claims 1-7.
PCT/CN2022/092367 2021-07-19 2022-05-12 Audio signal processing method and related electronic device WO2023000778A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110815051.X 2021-07-19
CN202110815051.XA CN115641870A (en) 2021-07-19 2021-07-19 Audio signal processing method and related electronic equipment

Publications (2)

Publication Number Publication Date
WO2023000778A1 WO2023000778A1 (en) 2023-01-26
WO2023000778A9 true WO2023000778A9 (en) 2023-06-15

Family

ID=84939464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092367 WO2023000778A1 (en) 2021-07-19 2022-05-12 Audio signal processing method and related electronic device

Country Status (2)

Country Link
CN (1) CN115641870A (en)
WO (1) WO2023000778A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116233696B (en) * 2023-05-05 2023-09-15 荣耀终端有限公司 Airflow noise suppression method, audio module, sound generating device and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9565508B1 (en) * 2012-09-07 2017-02-07 MUSIC Group IP Ltd. Loudness level and range processing
US9672843B2 (en) * 2014-05-29 2017-06-06 Apple Inc. Apparatus and method for improving an audio signal in the spectral domain
KR20170030384A (en) * 2015-09-09 2017-03-17 삼성전자주식회사 Apparatus and Method for controlling sound, Apparatus and Method for learning genre recognition model
CN108322868B (en) * 2018-01-19 2020-07-07 瑞声科技(南京)有限公司 Method for improving sound quality of piano played by loudspeaker
KR20230144650A (en) * 2018-09-07 2023-10-16 그레이스노트, 인코포레이티드 Methods and Apparatus for Dynamic Volume Adjustment via Audio Classification
CN109616135B (en) * 2018-11-14 2021-08-03 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN111343540B (en) * 2020-03-05 2021-07-20 维沃移动通信有限公司 Piano audio processing method and electronic equipment
CN112767967A (en) * 2020-12-30 2021-05-07 深延科技(北京)有限公司 Voice classification method and device and automatic voice classification method

Also Published As

Publication number Publication date
WO2023000778A1 (en) 2023-01-26
CN115641870A (en) 2023-01-24

Similar Documents

Publication Publication Date Title
JP7222112B2 (en) Singing recording methods, voice correction methods, and electronic devices
US11880628B2 (en) Screen mirroring display method and electronic device
CN109994127B (en) Audio detection method and device, electronic equipment and storage medium
CN109547848B (en) Loudness adjustment method and device, electronic equipment and storage medium
CN109243479B (en) Audio signal processing method and device, electronic equipment and storage medium
CN109003621B (en) Audio processing method and device and storage medium
CN111986691A (en) Audio processing method and device, computer equipment and storage medium
WO2023000778A9 (en) Audio signal processing method and related electronic device
US20240031766A1 (en) Sound processing method and apparatus thereof
CN109961802B (en) Sound quality comparison method, device, electronic equipment and storage medium
WO2022143258A1 (en) Voice interaction processing method and related apparatus
WO2020062014A1 (en) Method for inputting information into input box and electronic device
WO2024093515A1 (en) Voice interaction method and related electronic device
CN116055982B (en) Audio output method, device and storage medium
WO2023061330A1 (en) Audio synthesis method and apparatus, and device and computer-readable storage medium
WO2022007757A1 (en) Cross-device voiceprint registration method, electronic device and storage medium
CN114974213A (en) Audio processing method, electronic device and storage medium
CN113707162A (en) Voice signal processing method, device, equipment and storage medium
CN113840034B (en) Sound signal processing method and terminal device
CN115359156B (en) Audio playing method, device, equipment and storage medium
RU2777617C1 (en) Song recording method, sound correction method and electronic device
CN116546126B (en) Noise suppression method and electronic equipment
WO2024051638A1 (en) Sound-field calibration method, and electronic device and system
WO2024046416A1 (en) Volume adjustment method, electronic device and system
WO2023142784A1 (en) Volume control method, electronic device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE