CN112908350B - Audio processing method, communication device, chip and module equipment thereof - Google Patents

Audio processing method, communication device, chip and module equipment thereof Download PDF

Info

Publication number
CN112908350B
CN112908350B CN202110134225.6A CN202110134225A CN112908350B CN 112908350 B CN112908350 B CN 112908350B CN 202110134225 A CN202110134225 A CN 202110134225A CN 112908350 B CN112908350 B CN 112908350B
Authority
CN
China
Prior art keywords
audio signal
signal
weight
audio
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110134225.6A
Other languages
Chinese (zh)
Other versions
CN112908350A (en
Inventor
赵喆
陈俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN202110134225.6A priority Critical patent/CN112908350B/en
Publication of CN112908350A publication Critical patent/CN112908350A/en
Application granted granted Critical
Publication of CN112908350B publication Critical patent/CN112908350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R9/00Transducers of moving-coil, moving-strip, or moving-wire type
    • H04R9/02Details
    • H04R9/025Magnetic circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application discloses an audio processing method, a communication device, a chip and module equipment, wherein the method comprises the following steps: collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals; acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal; performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal. By implementing the method provided by the application, the high-fidelity recording is favorably realized.

Description

Audio processing method, communication device, chip and module equipment thereof
Technical Field
The present application relates to the field of multimedia technologies, and in particular, to an audio processing method, a communication device, a chip, and a module device thereof.
Background
Recording is widely used in voice calls, voice acquisition, and other situations. But in different scenarios, for example: the requirements for recording equipment may also vary at the concert site or in the context of whispering. If the audio recording of the concert scene is met, the low-volume sound is very weak and has poor definition; if the recording of the audio in order to meet the situation of stealing the whisper, the large volume sound can be broken.
Disclosure of Invention
The application discloses an audio processing method, a communication device, a chip and module equipment thereof, which are beneficial to realizing high-fidelity recording.
In a first aspect, the present application provides an audio processing method, applied to a terminal device, the method including: collecting a first audio signal through a microphone, and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals; acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal; performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In one implementation, the second weight corresponding to the second audio signal is determined according to the first weight corresponding to the first audio signal.
In one implementation, a first weight corresponding to the first audio signal is determined according to an energy of each frame of audio signal in the first audio signal.
In one implementation, the energy of each frame of audio signal in the first audio signal is obtained; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal; determining a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration being a sum of durations of the attack speech signals in the first audio signal, the second duration being a sum of durations of the speech signals in the first audio signal; and determining a first weight corresponding to the first audio signal according to the first numerical value.
In one implementation, a second value is determined according to the first value; determining a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
In a second aspect, the present application provides a communication device for implementing the units of the method in the first aspect and any possible implementation manner thereof.
In a third aspect, the present application provides a communication device comprising a processor configured to perform the method of the first aspect and any possible implementation manner thereof.
In a fourth aspect, the present application provides a communication device comprising a processor and a memory for storing computer-executable instructions; the processor is configured to invoke the program code from the memory to perform the method of the first aspect and any possible implementation thereof.
In a fifth aspect, the present application provides a chip, configured to acquire a first audio signal through a microphone and acquire a second audio signal through a speaker; the first audio signal and the second audio signal each comprise N frames of audio signals; the chip is further configured to obtain a first weight corresponding to the first audio signal, and obtain a second weight corresponding to the second audio signal; the chip is also used for carrying out audio mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In a sixth aspect, the present application provides a module device, which includes a communication module, a power module, a storage module, and a chip module, wherein: the power module is used for providing electric energy for the module equipment; the storage module is used for storing data and instructions; the communication module is used for carrying out internal communication of the module equipment or is used for carrying out communication between the module equipment and external equipment; this chip module is used for: collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals; acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal; performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural view of a speaker;
fig. 2 is a schematic diagram of acquiring an audio signal through a microphone and a speaker according to an embodiment of the present application;
fig. 3 is a flowchart of an audio processing method according to an embodiment of the present application;
fig. 4 is a schematic diagram of an audio processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a communication device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another communication device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a module apparatus according to an embodiment of the present application.
Detailed Description
For ease of understanding, the background concepts to which this application relates will first be described.
1. Working mode of microphone
The microphone functions as an acoustoelectric conversion device, and functions to convert an acoustic signal into an electric signal. When the microphone is used for recording, the microphone has certain advantages for collecting low-volume sound. However, for the collection of large volume sound, the microphone has insufficient restitution degree, and is easy to break sound, thereby causing distortion.
2. Working mode of loudspeaker
Fig. 1 is a schematic structural diagram of a speaker. A loudspeaker is used as an electroacoustic transducer device, and it is conventional to operate by converting an electric signal into an acoustic signal, for example, from left to right as shown in fig. 1. Conversely, when the loudspeaker diaphragm is excited by sound waves, the left coil in fig. 1 makes a cutting magnetic induction line motion in a magnetic field, and an induced current is also generated. In other words, the loudspeaker can also be operated from right to left as shown in fig. 1, i.e. converting an acoustic signal into an electrical signal. But is much weaker than the current generated when a conventional microphone is operated. However, there is also an advantage in that the current signal is not distorted because the amplitude displacement of the speaker is limited.
In view of this, the audio processing method provided in the embodiment of the present application is applied to a terminal device, and is beneficial to implementing high-fidelity recording. It should be noted that the terminal device may be configured with a speaker and a microphone. The terminal Equipment may be an access terminal, a UE (User Equipment), a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile terminal, a User terminal, a wireless communication device, a User agent, or a User Equipment. The access terminal may be a terminal device in the Internet of Things, a vehicle-mounted device, a wearable device, a virtual reality device, a cellular phone, a cordless phone, a SIP (Session Initiation Protocol) phone, a WLL (Wireless Local Loop) station, a PDA (Personal Digital Assistant), a handheld device with Wireless communication function, a computing device or other processing device connected to a Wireless modem, a terminal device in a future 5G (the 5th Generation, fifth Generation Mobile communication technology) Network, a terminal device in a future evolved Public Land Mobile Network (PLMN), or an NB-IoT (Narrow Band of Internet of Things) terminal device, etc. The mobile terminal may be a smart phone, a tablet Computer, a PC (Personal Computer), a smart tv, a smart watch, or the like.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of the present disclosure for collecting an audio signal through a microphone and a speaker. As can be seen from fig. 2, the sound signal is input from the speaker 201 and the microphone 202 on the right side, passes through the codec 203, and is stored in the preprocessing buffer 204. It should be noted that the codec is responsible for converting between analog signals and digital signals. The uplink path of the codec is respectively connected with two paths of signals input by a microphone and a loudspeaker, and the two paths of signals respectively convert analog signals into digital signals through two analog-to-digital converters in the codec and finally store the digital signals in a preprocessing cache for subsequent use. In a recording scene, the upstream receiving path of the speaker is powered down, and no signal is input, so that no interference is caused to recording.
When audio recording is carried out through the terminal equipment, the loudspeaker and the microphone can be used at the same time, sound under a certain scene is collected, and the method is favorable for simultaneously acquiring high-quality small-volume sound recorded by the microphone and high-quality large-volume sound recorded by the loudspeaker.
Referring to fig. 3, fig. 3 is a flowchart illustrating an audio processing method according to an embodiment of the present disclosure. The audio processing method may be implemented by the terminal device configured with the speaker and the microphone, or may be implemented by a chip in the terminal device. As shown in fig. 3, the audio processing method includes, but is not limited to, the following steps S301 to S303.
Step S301, a microphone configured on the terminal equipment is used for collecting a first audio signal, and a loudspeaker configured on the terminal equipment is used for collecting a second audio signal; the first audio signal and the second audio signal each include N frames of audio signals.
It should be noted that the microphone and the speaker of the terminal device are configured to collect the same sound signal at the same time and in the same scene, but the internal structures of the microphone and the speaker are different, which causes the collected sound signal to be different. Therefore, the collected sound signal is divided into two paths to access the terminal equipment. The sound signal collected by the microphone end is a first audio signal, and the sound signal collected by the loudspeaker end is a second audio signal.
It should be further noted that the first audio signal and the second audio signal each include N frames of audio signals, where N is a positive integer. In general, each frame of the original signal of the microphone path may be 20ms in length, and each frame of the original signal of the speaker path may also be 20ms in length; and the signal length that preliminary treatment buffer can store is 2000ms, that is, preliminary treatment buffer can store 100 frames of data. Optionally, the signal length that the pre-processing buffer can store is an adjustable parameter, for example, the signal length that the pre-processing buffer can store may be adjusted to 4000ms, and at this time, the pre-processing buffer may store 200 frames of data. In the embodiment of the present application, the pre-processing buffer may store 100 frames of data (i.e., N is 100) for example, which is only used for example and does not limit other embodiments of the present application.
Since the length of the audio signal collected by the normal recording is larger than the above-mentioned 100 frame data, that is, larger than 2 s. Therefore, setting the value of the signal length that the pre-processing buffer can store to 100 frames means that the acquired audio signal needs to be divided into a plurality of 100 frames of data for processing. Since the audio signal may vary in different periods, for example, the overall sound is larger in the T1 period and smaller in the T2 period. Therefore, the embodiment of the application selects to set the value of the signal length which can be stored in the preprocessing cache to 100 frames, so that better connection of the processed audio signals can be facilitated, and the situation that the processed sound is suddenly changed or discontinuous can be avoided.
Step S302, the terminal device obtains a first weight corresponding to the first audio signal, and obtains a second weight corresponding to the second audio signal.
In an implementation manner, the obtaining a first weight corresponding to a first audio signal includes: and determining a first weight corresponding to the first audio signal according to the energy of each frame of audio signal in the first audio signal.
Specifically, the determining the first weight corresponding to the first audio according to the energy of each frame of the audio signal in the first audio signal may include, but is not limited to, the following three steps.
In the first step, the terminal device obtains the energy of each frame of audio signal in the first audio signal.
Specifically, firstly, the terminal device performs wiener filtering processing on an audio signal collected by a microphone end in a preprocessing cache, so as to obtain gain values of each sub-band of a frequency domain and each signal frame of a time domain; secondly, the terminal device performs Voice Activity Detection (VAD) energy estimation on each subband in the frequency domain and each signal frame in the time domain.
The number of the sub-bands and the bandwidth of each sub-band can be adjusted according to actual conditions or user requirements. For example, taking the sampling rate of the audio record file as 48KHz as an example, the audio record file can be divided into the following sub-bands in the frequency domain: 0-500 Hz, 500-1250 Hz, 1250-2250 Hz, 2250-3500 Hz, 3500 Hz-5 KHz, 5 KHz-10 KHz, 10 KHz-20 KHz, 20 KHz-48 KHz. It should be noted that the above resolution method is only used for example and not limiting the application.
Specifically, in the embodiment of the present application, the audio signal may be divided into a speech signal or a noise signal according to the energy data of the audio signal obtained by the above calculation method, and the speech signal may be further divided into a normal speech signal or a plosive speech signal. Optionally, other categories of the audio signal may also be distinguished according to the energy values of different parameters of the audio signal, which is not limited in this application.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
And judging each signal frame through the energy value of each frame of audio signal acquired by the VAD energy estimation. Specifically, it is assumed that the number of subbands of each frame of the audio signal in the frequency domain is M, where M is a positive integer. The voice signal can be an audio signal of which the number of the first sub-bands is greater than M/2, and the noise signal can be an audio signal of which the number of the first sub-bands is less than M/2; the first sub-band is a sub-band with an energy value larger than a first preset value. The first preset value is an adjustable parameter, and can be adjusted in a targeted manner according to the needs of a user. In other words, when each signal frame is judged, if the sub-band energy value of the frame audio signal in the frequency domain is more than half of the number of the sub-band energy values, the frame audio signal is judged to be a voice signal; and if the sub-band energy value of the frame of audio signal in the frequency domain is more than half of the number of the sub-band energy values and is less than a first preset value, judging the frame of audio signal to be a noise signal.
Specifically, the speech signal may also be an audio signal in which the number of the first sub-bands is equal to M/2 and the sum of the time domain energies is greater than a second preset value. The second preset value is also an adjustable parameter, and can be adjusted in a targeted manner according to the needs of a user. In other words, if the number of sub-bands of the frame of audio signal in the frequency domain, which are greater than the first preset value, is equal to the number of sub-bands of the frame of audio signal, which are less than the first preset value, the sum of the time domain energy of the frame of audio signal is determined; if the sum of the time domain energy is larger than a second preset value, judging that the frame of audio signal is a voice signal; and if the time domain energy sum is smaller than a second preset value, judging the frame of audio signal to be a noise signal.
Wherein, the time domain energy sum is the energy sum of the frame audio signal in the time domain, that is, the energy sum of the frame audio within 20 ms. As can be seen from the foregoing, the length of each frame of audio signal is 20ms, but when calculating the energy sum of the frame, the length of the time domain may be 20ms as well, or other values may be used, for example: 10ms or 5 ms. It should be noted that the above numerical values are only for example and do not limit the present application. When the speech signal and the noise signal are distinguished by the time domain energy, the speech signal and the noise signal can be more finely distinguished by performing the subdivision in a smaller unit.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the time domain energy sum is greater than a third preset value and the ratio of the energy value of the high-frequency sub-band to the energy value of the low-frequency sub-band is greater than a fourth preset value. The third preset value and the fourth preset value are also adjustable parameters, and can be adjusted in a targeted manner according to the needs of a user.
Specifically, the plosive speech signal may be a speech signal in which the sum of time domain energy is greater than a third preset value, and the ratio between the energy of the high-frequency subband and the energy of the low-frequency subband is greater than a fourth preset value; the normal speech signal may be a speech signal in which the sum of the energies in the time domain is smaller than a third preset value, and/or the ratio between the energy of the high frequency subband and the energy of the low frequency subband is smaller than a fourth preset value.
It should be noted that the ranges of the high frequency sub-band and the low frequency sub-band are preset ranges in the terminal device, for example: similarly, taking the sampling rate of the recording file as 48KHz as an example, 5 KHz-48 KHz can be assumed as the range of the high-frequency sub-band, and 0 Hz-1250 Hz can be assumed as the range of the low-frequency sub-band. The above data are only for example and do not limit the embodiments of the present application.
It should be noted that, when the above-mentioned time domain energy sum is used to distinguish the unvoiced speech signal from the normal speech signal, the time domain can be subdivided by using a smaller unit, in the same manner as when the speech signal is distinguished from the noise signal. Through the subdivision mode, the sound breaking voice signal and the normal voice signal can be distinguished more finely.
And secondly, the terminal equipment determines a first value according to the energy of each frame of audio signal in the first audio signal.
Wherein, the first value is the ratio of the first time length to the second time length; the first duration is the sum of the durations of the plosive speech signals in the first audio signal, and the second duration is the sum of the durations of the speech signals in the first audio signal.
Specifically, the first step can be used to know which frames the speech signal is, and which frames the unvoiced speech signal is. Therefore, the duration of all frames of the speech signal is added, i.e. the second duration is obtained; and adding the time lengths of all the frames of the plosive voice signals to obtain the first time length. The ratio of the first duration to the second duration is the first value. Assuming that the first value is Q, it can be understood that Q is the ratio of the duration of the plosive speech signal to the duration of the total speech signal.
And thirdly, the terminal equipment determines a first weight corresponding to the first audio signal according to the first numerical value. Specifically, the first weight W1 ═ 1-Q can be calculated using the formula.
In one implementation, the terminal device may determine a first weight corresponding to the first audio signal according to the first value and the weight coefficient. The weighting coefficient is also an adjustable parameter, and can be adjusted in a targeted manner according to the needs of the user. It should be noted that the weight coefficient may further strengthen or weaken the weight value calculated by the first numerical value according to different situations, so that the obtained first weight more meets the needs of the user.
Specifically, assuming that the weight coefficient is a, the first weight W1 may be calculated by using a formula as a (1-Q). Wherein a is a positive number, and when a is smaller than 1, the weighting coefficient can weaken the effect of the first audio signal; when a is greater than 1, the weighting factor can enhance the effect of the first audio signal. It should be understood that the weight coefficient a may also be 1, and in the case of a weight coefficient of 1, the formula for determining the first weight may be changed to the first weight W1 being 1 (1-Q), that is, the above case of determining the first weight according to the first numerical value.
In one implementation, the determining a first weight corresponding to the first audio signal according to the first value includes: determining a second value according to the first value; determining a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
It should be noted that, as can be seen from the foregoing, the acquired audio signal is divided into a plurality of 100 frames of data to be processed, and the pre-processing buffer processes one audio signal of 100 frames at a time. Since the first audio signal may be any 100 frames of audio signal in the microphone-side audio signal, another 100 frames of audio signal may be present before the first audio signal, and similarly, another 100 frames of audio signal may also be present after the first audio signal.
It should be noted that the first audio signal is continuous with the audio signal before the first audio signal, and similarly, the first audio signal is continuous with the audio signal after the first audio signal. In other words, the first frame of the first audio signal and the last frame of the audio signal before the first audio signal are consecutive audio signals. Similarly, the last frame of the first audio signal and the first frame of the audio signal following the first audio signal are also consecutive audio signals.
As can be seen from the foregoing, the preprocessing buffer will obtain a weight value every time an audio signal of 100 frames is processed. Therefore, there is also a weight value, i.e., the third weight, for another audio signal of 100 frames before the first audio signal.
It should be noted that, since there may be different characteristics in each audio signal of 100 frames, for example, the overall volume of the first audio signal may be larger, and the overall volume of the previous audio signal of the first audio signal may be smaller. Therefore, there may be a large difference between the first weight corresponding to the first audio signal and the third weight corresponding to the previous audio signal of the first audio signal.
Therefore, in order to avoid sudden change of the front and rear groups of audio signals after being processed by the audio processing method, two weight values of the front and rear groups of audio signals can be balanced. As can be seen from the foregoing, the weight corresponding to the previous audio signal of the first audio signal is the third weight. The corresponding weight of the first audio signal can be determined by the first value, i.e. the second value. The third weight is balanced with the second value to obtain a balanced first weight. Optionally, an averaging processing method may be adopted for the third weight and the second value, that is, the value of the third weight and the second value are added and then averaged, so as to obtain the balanced first weight.
It should be noted that, if the first audio signal is the first 100 frames of audio signals, there are no other audio signals before the first audio signal; at this time, it may be assumed that the default weight of the microphone side is 1 and the default weight of the speaker side is 0, and the default weight and the first weight may be balanced. Optionally, the default weight may be adjusted according to a user requirement, which is not limited in the present application.
In one implementation, the terminal device may determine the first weight according to the second value, the third weight and the smoothing coefficient. The smoothing coefficient is also an adjustable parameter, and can be adjusted in a targeted manner according to the needs of a user. It should be noted that, assuming that the smoothing coefficient is b, the first weight W1 ═ b ×, + (1-b) × (c) is calculated by using a formula; wherein, the value range of b can be more than 0 and less than or equal to 1. When b is 1/2, the above-mentioned case is taken as the average value; when b is 1, this is the case where smoothing is not necessary.
Through the balance of the second numerical value, the third weight and the smooth coefficient, the processed audio signals can be better connected, and the situations of sudden change or sudden change of sound can not occur.
In an implementation manner, the obtaining of the second weight corresponding to the second audio signal includes: and determining a second weight corresponding to the second audio signal according to the first weight corresponding to the first audio signal.
It should be noted that, in the embodiment of the present application, a first weight corresponding to a first audio signal at a microphone end is obtained by calculating energy of the audio signal at the microphone end; and the second weight corresponding to the second audio at the loudspeaker end is calculated according to the first weight. Specifically, the second weight is determined according to the first weight, and the second weight W2 can be calculated to be 1-W1 by using a formula.
It should be further noted that, in some scenarios, the weight of the audio signal at the speaker end may be obtained first by calculating the energy of the audio signal at the speaker end, and then the weight of the audio signal at the microphone end is obtained through the weight of the speaker end, which is not limited in this application.
Step S303, the terminal equipment performs sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
And after the third audio signal and the fourth audio signal are subjected to sound mixing processing, the obtained audio signal is the target audio signal. In other words, the target audio signal fuses the audio signal at the microphone side after the processing and the audio signal at the speaker side after the processing. Therefore, by mixing the audio signals processed by the two paths, the advantages of the two paths of signals can be fully exerted, and the high fidelity of the target audio signal is realized.
In one implementation, the third audio signal and the fourth audio signal may be respectively subjected to related subsequent processing and then to audio mixing processing. The subsequent processing at the microphone end may include, but is not limited to, filtering processing by a low cut filter (also referred to as a "high pass filter") (LCF), performing Digital Gain (DG), and performing audio compression processing by an Automatic Level Controller (ALC), and the subsequent processing at the speaker end may include, but is not limited to, performing Digital Gain (DG) processing and equalizing processing by an Equalizer (EQ).
Referring to fig. 4, fig. 4 is a schematic diagram of an audio processing method according to an embodiment of the present disclosure. As can be seen from fig. 4, after the first audio signal and the second audio signal are subjected to weight calculation, a first weight and a second weight are respectively determined; a third audio signal can be determined according to the first audio signal and the first weight, and a fourth audio signal can be determined according to the second audio signal and the second weight; and after the third audio signal and the fourth audio signal are respectively subjected to the subsequent processing, mixing the audio to obtain a target audio signal. The audio processing method provided by the embodiment of the application is beneficial to realizing high-fidelity recording.
It should be noted that the subsequent processing categories provided in the embodiments of the present application are only used for example, and when different audio signals are processed, other different types of subsequent processing manners may exist, which is not limited in the present application.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a communication device according to an embodiment of the present disclosure. The device may be a terminal device, or a device in the terminal device, or a device capable of being used in cooperation with the terminal device. The communication apparatus shown in fig. 5 may include a processing unit 501 and an acquisition unit 502. The processing unit 501 is configured to perform data processing. An obtaining unit 502 is configured to obtain data. Wherein:
an obtaining unit 502, configured to collect a first audio signal through a microphone and collect a second audio signal through a speaker; the first audio signal and the second audio signal each comprise N frames of audio signals;
an obtaining unit 502, configured to obtain a first weight corresponding to the first audio signal, and obtain a second weight corresponding to the second audio signal;
the processing unit 501 is configured to perform audio mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In one implementation, the processing unit 501 is further configured to determine a second weight corresponding to the second audio signal according to a first weight corresponding to the first audio signal.
In one implementation, the processing unit 501 is further configured to determine a first weight corresponding to the first audio signal according to an energy of each frame of audio signal in the first audio signal.
In one implementation, the obtaining unit 502 is further configured to obtain energy of each frame of audio signal in the first audio signal; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal; the processing unit 501 is further configured to determine a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration being a sum of durations of the attack speech signals in the first audio signal, the second duration being a sum of durations of the speech signals in the first audio signal; the processing unit 501 is further configured to determine a first weight corresponding to the first audio signal according to the first value.
In one implementation, the processing unit 501 is further configured to determine a second value according to the first value; the processing unit 501 is further configured to determine a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
According to the embodiment of the present application, the units in the communication apparatus shown in fig. 5 may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) therein may be further split into multiple units with smaller functions to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the communication device may also include other units, and in practical applications, the functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.
The communication device may be, for example: a chip, or a chip module. Each module included in each apparatus and product described in the above embodiments may be a software module, a hardware module, or a part of the software module and a part of the hardware module. For example, for each device or product applied to or integrated in a chip, each module included in the device or product may be implemented by hardware such as a circuit, or at least a part of the modules may be implemented by a software program running on a processor integrated in the chip, and the rest (if any) part of the modules may be implemented by hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module included in the device and product may be implemented in a hardware manner such as a circuit, and different modules may be located in the same component (for example, a chip, a circuit module, etc.) or different components of the chip module, or at least part of the modules may be implemented in a software program, the software program runs on a processor integrated inside the chip module, and the rest (if any) part of the modules may be implemented in a hardware manner such as a circuit; for each device and product applied to or integrated in the terminal, each module included in the terminal may be implemented by using hardware such as a circuit, different modules may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules may be implemented by using a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules may be implemented by using hardware such as a circuit.
The embodiments of the present application and the embodiments of the foregoing method are based on the same concept, and the technical effects brought by the embodiments are also the same, and for the specific principle, reference is made to the description of the foregoing embodiments, which is not repeated herein.
Referring to fig. 6, fig. 6 is a communication device 60 according to an embodiment of the present disclosure. As shown in fig. 6, the communication device may include a processor 601. Optionally, the communication device may also include a memory 602. The processor 601 and the memory 602 may be connected by a bus 603 or other means. The bus lines are shown in fig. 6 as thick lines, and the connection between other components is merely illustrative and not intended to be limiting. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in an electrical, mechanical or other form, which is used for information interaction between the devices, units or modules. The specific connection medium between the processor 601 and the memory 602 is not limited in the embodiments of the present application.
The memory 602 may include both read-only memory and random access memory, and provides instructions and data to the processor 601. A portion of the memory 602 may also include non-volatile random access memory.
The Processor 601 may be a Central Processing Unit (CPU), and the Processor 601 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor, and optionally, the processor 601 may be any conventional processor or the like. Wherein:
a memory 602 for storing program instructions.
A processor 601 for invoking program instructions stored in memory 602 for:
collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals;
acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal;
performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In one implementation, the processor 601 is configured to determine a second weight corresponding to the second audio signal according to a first weight corresponding to the first audio signal.
In one implementation, the processor 601 is configured to determine a first weight corresponding to the first audio signal according to an energy of each frame of audio signal in the first audio signal.
In one implementation manner, the processor 601 is configured to obtain energy of each frame of audio signal in the first audio signal; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal; the processor 601 is configured to determine a first value according to energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration being a sum of durations of the attack speech signals in the first audio signal, the second duration being a sum of durations of the speech signals in the first audio signal; the processor 601 is configured to determine a first weight corresponding to the first audio signal according to the first value.
In one implementation, the processor 601 is configured to determine a second value according to the first value; the processor 601 is configured to determine a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
In the embodiment of the present application, the audio processing method of the embodiment of the present application may be implemented by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 and 3 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like as well as a storage element. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.
Based on the same inventive concept, the principle and the advantageous effect of the communication apparatus to solve the problem provided in the embodiment of the present application are similar to the principle and the advantageous effect of the communication apparatus to solve the problem in the embodiment of the method of the present application, and for brevity, the principle and the advantageous effect of the implementation of the method may be referred to, and are not described herein again.
The embodiment of the present application further provides a chip, where the chip may perform relevant steps of the terminal device in the foregoing method embodiment. The chip is used for: collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals; acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal; performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In one implementation, the second weight corresponding to the second audio signal is determined according to the first weight corresponding to the first audio signal.
In one implementation, a first weight corresponding to the first audio signal is determined according to an energy of each frame of audio signal in the first audio signal.
In one implementation, the energy of each frame of audio signal in the first audio signal is obtained; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal; determining a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first time length to a second time length, the first time length is a sum of time lengths of the sound breaking voice signals in the first audio signal, and the second time length is a sum of time lengths of the voice signals in the first audio signal; and determining a first weight corresponding to the first audio signal according to the first numerical value.
In one implementation, a second value is determined according to the first value; determining a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
In one possible implementation, the chip includes at least one processor, at least one first memory, and at least one second memory; the at least one first memory and the at least one processor are interconnected through a line, and instructions are stored in the first memory; the at least one second memory and the at least one processor are interconnected through a line, and the second memory stores the data required to be stored in the method embodiment.
For each device or product applied to or integrated in the chip, each module included in the device or product may be implemented by hardware such as a circuit, or at least a part of the modules may be implemented by a software program running on a processor integrated in the chip, and the rest (if any) part of the modules may be implemented by hardware such as a circuit.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a module apparatus according to an embodiment of the present disclosure. The module device 70 can perform the steps related to the terminal device in the foregoing method embodiments, and the module device 70 includes: a communication module 701, a power module 702, a memory module 703 and a chip module 704.
The power module 702 is used for providing power for the module device; the storage module 703 is used for storing data and instructions; the communication module 701 is used for performing internal communication of module equipment, or is used for performing communication between the module equipment and external equipment; the chip module 704 is used for:
collecting a first audio signal through a microphone, and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals; acquiring a first weight corresponding to the first audio signal and acquiring a second weight corresponding to the second audio signal; performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal, and the fourth audio signal is determined by the second weight and the second audio signal.
In one implementation, the second weight corresponding to the second audio signal is determined according to the first weight corresponding to the first audio signal.
In one implementation manner, the first weight corresponding to the first audio signal is determined according to the energy of each frame of audio signal in the first audio signal.
In one implementation, the energy of each frame of audio signal in the first audio signal is obtained; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal; determining a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration being a sum of durations of the attack speech signals in the first audio signal, the second duration being a sum of durations of the speech signals in the first audio signal; and determining a first weight corresponding to the first audio signal according to the first numerical value.
In one implementation, a second value is determined according to the first value; determining a first weight according to the second value and the third weight; the third weight is a weight corresponding to a previous audio signal of the first audio signal collected by the microphone.
In one implementation, the audio signal includes M subbands in the frequency domain; the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or the voice signal is an audio signal of which the number of the first sub-bands is equal to M/2 and the sum of the time domain energy is greater than a second preset value.
In one implementation, an audio signal includes, in the frequency domain, a high frequency sub-band and a low frequency sub-band; the sound breaking voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
For each device and product applied to or integrated in the chip module, each module included in the device and product may be implemented in a hardware manner such as a circuit, and different modules may be located in the same component (for example, a chip, a circuit module, etc.) or different components of the chip module, or at least part of the modules may be implemented in a software program, the software program runs on a processor integrated inside the chip module, and the rest (if any) part of the modules may be implemented in a hardware manner such as a circuit.
The embodiment of the present application further provides a computer-readable storage medium, in which one or more instructions are stored, and the one or more instructions are adapted to be loaded by a processor and to execute the audio processing method of the above method embodiment.
Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the audio processing method of the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The modules in the device can be merged, divided and deleted according to actual needs.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer readable storage medium, and the readable storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above disclosure is only one preferred embodiment of the present invention, which is only a part of the present invention, and certainly not intended to limit the scope of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

Claims (9)

1. A method of audio processing, the method comprising:
collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals;
acquiring the energy of each frame of audio signal in the first audio signal; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal;
determining a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration is a sum of durations of the plosive speech signals in the first audio signal, and the second duration is a sum of durations of the speech signals in the first audio signal;
determining a first weight corresponding to the first audio signal according to the first numerical value;
acquiring a second weight corresponding to the second audio signal;
performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal and the fourth audio signal is determined by the second weight and the second audio signal.
2. The method of claim 1, wherein obtaining the second weight corresponding to the second audio signal comprises:
and determining a second weight corresponding to the second audio signal according to the first weight corresponding to the first audio signal.
3. The method of claim 1, wherein determining the first weight corresponding to the first audio signal according to the first value comprises:
determining a second value according to the first value;
determining a first weight according to the second numerical value and a third weight; and the third weight is the weight corresponding to the previous audio signal of the first audio signal collected by the microphone.
4. The method according to claim 1, characterized in that the audio signal comprises in the frequency domain a number of subbands of M subbands;
the voice signal is an audio signal with the number of first sub-bands larger than M/2, and the first sub-bands are sub-bands with energy values larger than a first preset value; or,
the voice signals are audio signals, the number of the first sub-bands is equal to M/2, and the sum of the time domain energy is larger than a second preset value.
5. The method according to claim 1 or 4, wherein the audio signal comprises in the frequency domain a high frequency sub-band and a low frequency sub-band; the plosive voice signal is a voice signal of which the sum of time domain energy is greater than a third preset value and the ratio of the energy of the high-frequency sub-band to the energy of the low-frequency sub-band is greater than a fourth preset value.
6. A communication apparatus comprising means for performing the method of any of claims 1-5.
7. A communications apparatus, comprising a processor;
the processor for performing the method of any one of claims 1-5.
8. The communications apparatus of claim 7, the communications apparatus further comprising a memory:
the memory for storing a computer program;
the processor, in particular for calling the computer program from the memory, to execute the method according to any of claims 1 to 5.
9. The utility model provides a module equipment, its characterized in that, module equipment includes communication module, power module, storage module and chip module, wherein:
the power supply module is used for providing electric energy for the module equipment;
the storage module is used for storing data and instructions;
the communication module is used for carrying out internal communication of module equipment or is used for carrying out communication between the module equipment and external equipment;
the chip module is used for:
collecting a first audio signal through a microphone and collecting a second audio signal through a loudspeaker; the first audio signal and the second audio signal each comprise N frames of audio signals;
acquiring the energy of each frame of audio signal in the first audio signal; the audio signal is a voice signal or a noise signal, and the voice signal is a normal voice signal or a sound breaking voice signal;
determining a first value according to the energy of each frame of audio signal in the first audio signal; the first value is a ratio of a first duration to a second duration, the first duration is a sum of durations of the plosive speech signals in the first audio signal, and the second duration is a sum of durations of the speech signals in the first audio signal;
determining a first weight corresponding to the first audio signal according to the first numerical value;
acquiring a second weight corresponding to the second audio signal;
performing sound mixing processing on the third audio signal and the fourth audio signal to obtain a target audio signal; wherein the third audio signal is determined by the first weight and the first audio signal and the fourth audio signal is determined by the second weight and the second audio signal.
CN202110134225.6A 2021-01-29 2021-01-29 Audio processing method, communication device, chip and module equipment thereof Active CN112908350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110134225.6A CN112908350B (en) 2021-01-29 2021-01-29 Audio processing method, communication device, chip and module equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110134225.6A CN112908350B (en) 2021-01-29 2021-01-29 Audio processing method, communication device, chip and module equipment thereof

Publications (2)

Publication Number Publication Date
CN112908350A CN112908350A (en) 2021-06-04
CN112908350B true CN112908350B (en) 2022-08-26

Family

ID=76122166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110134225.6A Active CN112908350B (en) 2021-01-29 2021-01-29 Audio processing method, communication device, chip and module equipment thereof

Country Status (1)

Country Link
CN (1) CN112908350B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014177084A1 (en) * 2013-08-30 2014-11-06 中兴通讯股份有限公司 Voice activation detection method and device
CN106101927A (en) * 2016-06-29 2016-11-09 维沃移动通信有限公司 A kind of acoustic signal processing method, chip and electronic equipment
CN106255000A (en) * 2016-07-29 2016-12-21 维沃移动通信有限公司 A kind of audio signal sample method and mobile terminal
CN107071127A (en) * 2017-04-28 2017-08-18 维沃移动通信有限公司 A kind of way of recording and mobile terminal
CN107333093A (en) * 2017-05-24 2017-11-07 苏州科达科技股份有限公司 A kind of sound processing method, device, terminal and computer-readable recording medium
CN112165558A (en) * 2020-09-21 2021-01-01 普联国际有限公司 Method and device for detecting double-talk state, storage medium and terminal equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014177084A1 (en) * 2013-08-30 2014-11-06 中兴通讯股份有限公司 Voice activation detection method and device
CN106101927A (en) * 2016-06-29 2016-11-09 维沃移动通信有限公司 A kind of acoustic signal processing method, chip and electronic equipment
CN106255000A (en) * 2016-07-29 2016-12-21 维沃移动通信有限公司 A kind of audio signal sample method and mobile terminal
CN107071127A (en) * 2017-04-28 2017-08-18 维沃移动通信有限公司 A kind of way of recording and mobile terminal
CN107333093A (en) * 2017-05-24 2017-11-07 苏州科达科技股份有限公司 A kind of sound processing method, device, terminal and computer-readable recording medium
CN112165558A (en) * 2020-09-21 2021-01-01 普联国际有限公司 Method and device for detecting double-talk state, storage medium and terminal equipment

Also Published As

Publication number Publication date
CN112908350A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
KR100800725B1 (en) Automatic volume controlling method for mobile telephony audio player and therefor apparatus
US11605394B2 (en) Speech signal cascade processing method, terminal, and computer-readable storage medium
US8787591B2 (en) Method and system for interference suppression using blind source separation
JP4836720B2 (en) Noise suppressor
CN112767963B (en) Voice enhancement method, device and system and computer readable storage medium
US20120263317A1 (en) Systems, methods, apparatus, and computer readable media for equalization
CN105657110B (en) Echo cancellation method and device for voice communication
JP6073456B2 (en) Speech enhancement device
EP3815082B1 (en) Adaptive comfort noise parameter determination
NL2007764A (en) Intelligibility control using ambient noise detection.
WO2014000476A1 (en) Voice noise reduction method and device for mobile terminal
CN105531764A (en) Method for compensating hearing loss in telephone system and mobile telephone device
CN108133712B (en) Method and device for processing audio data
CN112565981B (en) Howling suppression method, howling suppression device, hearing aid, and storage medium
JPH0946233A (en) Sound encoding method/device and sound decoding method/ device
JP2008309955A (en) Noise suppresser
TWI594232B (en) Method and apparatus for processing of audio signals
CN112908350B (en) Audio processing method, communication device, chip and module equipment thereof
CN111477246B (en) Voice processing method and device and intelligent terminal
CN107750038B (en) Volume adjusting method, device, equipment and storage medium
JP6197367B2 (en) Communication device and masking sound generation program
CN116193321A (en) Sound signal processing method, device, equipment and storage medium
CN114023352A (en) Voice enhancement method and device based on energy spectrum depth modulation
CN207070111U (en) Noise reduction terminal
JP2010092057A (en) Receive call speech processing device and receive call speech reproduction device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant