CN111988705B - Audio processing method, device, terminal and storage medium - Google Patents

Audio processing method, device, terminal and storage medium Download PDF

Info

Publication number
CN111988705B
CN111988705B CN201910426098.XA CN201910426098A CN111988705B CN 111988705 B CN111988705 B CN 111988705B CN 201910426098 A CN201910426098 A CN 201910426098A CN 111988705 B CN111988705 B CN 111988705B
Authority
CN
China
Prior art keywords
audio
signal
microphone
voice
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910426098.XA
Other languages
Chinese (zh)
Other versions
CN111988705A (en
Inventor
杨依珍
马宁
陈宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201910426098.XA priority Critical patent/CN111988705B/en
Publication of CN111988705A publication Critical patent/CN111988705A/en
Application granted granted Critical
Publication of CN111988705B publication Critical patent/CN111988705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/01Noise reduction using microphones having different directional characteristics

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The disclosure relates to an audio processing method, an audio processing device, a terminal and a storage medium, and belongs to the field of mobile terminals. The terminal includes: at least two microphones arranged on the terminal body, and an AI audio processor; the at least two microphone arrays are arranged in different directions of the terminal body, and the at least two microphones are used for acquiring audio signals in different directions; the AI audio processor is connected with each microphone, and the AI audio processor is used for carrying out noise reduction processing and stereo processing on the audio signals collected by each microphone. When the terminal provided by the embodiment of the disclosure is used for recording, the AI audio processor which is independently arranged is used for carrying out noise reduction treatment on the audio signal, the audio noise reduction effect is optimized, meanwhile, the array microphone is used for carrying out sound reception, and the AI audio processor is used for carrying out stereo treatment, so that the stereo perception of sound is enhanced, the effect of stereo recording is achieved, and the recording quality can be improved compared with the related technology.

Description

Audio processing method, device, terminal and storage medium
Technical Field
The present disclosure relates to the field of mobile terminals, and in particular, to an audio processing method, apparatus, terminal, and storage medium.
Background
The recording function is a function commonly used in mobile terminals. By using the recording function, the user can record the call or record in the course of the meeting.
In the recording process, the mobile terminal collects an audio Analog signal through a microphone, then an Analog-to-Digital Converter (ADC) converts the audio Analog signal into an audio Digital signal, and a Central Processing Unit (CPU) processes the audio Digital signal to generate an audio file.
However, since a large amount of CPU computing resources are consumed when the audio digital signal is processed in a complex manner, in the related art, the terminal simply processes the audio digital signal by using the CPU, which results in a poor recording effect.
Disclosure of Invention
The embodiment of the disclosure provides an audio processing method, an audio processing device, a terminal and a storage medium, which can solve the problem of poor recording effect in the related art. The technical scheme is as follows:
according to an aspect of the embodiments of the present disclosure, there is provided a terminal, including: at least two microphones disposed on the terminal body, and an Artificial Intelligence (AI) audio processor;
the at least two microphone arrays are arranged in different directions of the terminal body, and the at least two microphones are used for acquiring audio signals in different directions;
the AI audio processor is connected with each microphone and is used for carrying out noise reduction processing and stereo processing on audio signals collected by each microphone.
Optionally, the array arrangement mode of the at least two microphones on the terminal body includes linear array arrangement and area array arrangement;
when the linear array is adopted for arrangement, at least two microphones are positioned on the same straight line; when the area array is adopted for setting, the directions of at least two microphones on the terminal body are centrosymmetric.
Optionally, the AI audio processor is configured to separate a noise signal and a human voice signal in each audio signal by an AI noise reduction algorithm;
and the AI audio processor is also used for determining the direction information corresponding to the human voice signals according to the signal intensity of each path of human voice signals and the directions of the microphones on the terminal body.
Optionally, the AI audio processor is further connected to a memory;
the AI audio processor is used for extracting sample voiceprint characteristics of the sample voiceprint signals and carrying out voiceprint enhancement processing on target voiceprint signals in the voiceprint signals according to the sample voiceprint characteristics, wherein the voiceprint characteristics of the target voiceprint signals are matched with the sample voiceprint characteristics.
Optionally, the AI audio Processor is further connected to an Application Processor (AP), and the AP is configured to send an audio processing instruction to the AI audio Processor and/or receive processed audio data transmitted by the AI audio Processor.
Optionally, the microphone is an analog microphone, an audio signal output by the analog microphone is an analog audio signal, and the analog microphone is connected to the AI audio processor through an ADC;
or the like, or, alternatively,
the microphone is a digital microphone, an audio signal output by the digital microphone is a digital audio signal, and the digital microphone is directly connected with the AI audio processor.
Optionally, the microphone is a Micro-Electro-Mechanical System (MEMS) microphone.
According to another aspect of the embodiments of the present disclosure, there is provided an audio processing method, which is used for the terminal in the above aspect, and the method includes:
acquiring audio signals collected by each microphone on a terminal body;
performing noise reduction processing on each path of audio signal through an AI noise reduction algorithm to obtain a human voice signal in the audio signal, wherein the AI noise reduction algorithm is used for separating the noise signal and the human voice signal in the audio signal;
determining the direction information corresponding to the voice signals according to the signal intensity of each path of voice signals;
and generating audio data according to the human voice signal and the azimuth information.
Optionally, the determining, according to the signal strength of each path of the voice signal, the azimuth information corresponding to the voice signal includes:
determining the highest signal intensity of each path of the human voice signals;
determining a microphone orientation of a target microphone according to the highest signal strength, wherein the target microphone is a microphone for acquiring the human voice signal with the highest signal strength;
and determining the microphone position as position information corresponding to the human voice signal.
Optionally, after performing noise reduction processing on each path of the audio signal through an AI noise reduction algorithm to obtain a human voice signal in the audio signal, the method further includes:
acquiring a sample human voice signal;
extracting sample voiceprint characteristics of the sample human voice signals;
and performing voice enhancement processing on a target voice signal in the voice signals according to the sample voice print characteristics, wherein the voice print characteristics of the target voice signal are matched with the sample voice print characteristics, and the voice enhancement processing comprises at least one of volume enhancement and voice beautification processing.
Optionally, after determining the direction information corresponding to the voice signal according to the signal strength of each path of the voice signal, the method further includes:
performing predetermined processing on a noise signal in the audio signal to obtain a processed noise signal;
and generating the audio data according to the human voice signal, the azimuth information and the processed noise signal.
According to another aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus for a terminal according to the above aspect, the apparatus comprising:
the terminal comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire audio signals acquired by each microphone on a terminal body;
the noise reduction module is configured to perform noise reduction processing on each path of audio signal through an AI noise reduction algorithm to obtain a human voice signal in the audio signal, wherein the AI noise reduction algorithm is used for separating the noise signal and the human voice signal in the audio signal;
the determining module is configured to determine the azimuth information corresponding to the voice signals according to the signal strength of each path of voice signals;
a first generation module configured to generate audio data from the human voice signal and the orientation information.
Optionally, the determining module includes:
a first determining submodule configured to determine the highest signal strength of each of the voice signals;
a second determination sub-module configured to determine a microphone orientation of a target microphone from the highest signal strength, the target microphone being a microphone from which the human voice signal of the highest signal strength is acquired;
a third determining submodule configured to determine the microphone orientation as orientation information corresponding to the human voice signal.
Optionally, the apparatus further comprises:
a second acquisition module configured to acquire a sample vocal signal;
an extraction module configured to extract sample voiceprint features of the sample human voice signal;
a voice enhancement module configured to perform voice enhancement processing on a target voice signal in the voice signals according to the sample voiceprint features, wherein the voiceprint features of the target voice signal are matched with the sample voiceprint features, and the voice enhancement processing includes at least one of volume enhancement and voice beautification processing.
Optionally, the apparatus further includes:
the processing module is configured to perform predetermined processing on a noise signal in the audio signal to obtain a processed noise signal;
a second generation module configured to generate the audio data from the vocal signals, the orientation information, and the processed noise signals.
According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing at least one instruction for execution by a processor to implement the audio processing method according to the above aspect.
The technical scheme provided by the embodiment of the disclosure at least comprises the following beneficial effects:
at least two microphones are arranged on the terminal body in different direction arrays, and the microphones in different directions are used for collecting audio signals in different directions, so that an AI audio processor connected with each microphone is used for carrying out noise reduction processing and stereo processing on the audio signals; when the terminal provided by the embodiment of the disclosure is used for recording, the AI audio processor which is independently arranged is used for carrying out noise reduction treatment on the audio signal, the audio noise reduction effect is optimized, meanwhile, the array microphone is used for carrying out sound reception, and the AI audio processor is used for carrying out stereo treatment, so that the stereo perception of sound is enhanced, the effect of stereo recording is achieved, and the recording quality can be improved compared with the related technology.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 illustrates a schematic structural diagram of a terminal provided in an exemplary embodiment of the present disclosure;
fig. 2 shows a schematic structural diagram of a terminal provided in another exemplary embodiment of the present disclosure;
FIG. 3 illustrates a method flow diagram of an audio processing method provided by an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a method flow diagram of an audio processing method provided by another illustrative embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an implementation of a sample human voice entry process provided by an illustrative embodiment;
FIG. 6 is an interface diagram of a recording interface provided by an exemplary embodiment;
fig. 7 shows a block diagram of an audio processing apparatus according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Referring to fig. 1, a schematic structural diagram of a terminal provided in an exemplary embodiment of the present disclosure is shown. The terminal may be a smart phone, a tablet computer, a personal portable computer, and the like, which is not limited in the embodiments of the present disclosure. The terminal includes: at least two microphones 11 provided on the terminal body, and an AI audio processor 12.
Wherein, at least two microphone 11 arrays are arranged at different directions of the terminal body for collecting audio signals of different directions.
In order to collect audio signals from different directions, in the embodiment of the present disclosure, the number of microphones on the terminal body is at least two. For example, the number of microphones on the terminal body may be two, four, six, and the like, and the number of microphones is not limited in the embodiments of the present application.
In order to reduce the body space occupied by the microphone, the microphone is optionally a MEMS microphone.
Optionally, a microphone opening is formed in the body frame of the terminal body, and the microphone is arranged at the microphone opening of the body frame, so that a good sound receiving effect is achieved.
In a possible implementation manner, the arrangement manner of the at least two microphones on the terminal body includes a linear array arrangement and an area array arrangement.
When the linear array is adopted, at least two microphones are positioned on the same straight line, for example, two microphones are arranged on the lower frame of the terminal body. Optionally, in order to improve the sound receiving effect, the distance between adjacent microphones on the same straight line is greater than the distance threshold.
When the area array is adopted, the positions of the at least two microphones on the terminal body are centrosymmetric (located on the same plane), for example, as shown in fig. 2, two microphones 11 are arranged on the upper frame of the terminal body, two microphones 11 are arranged on the lower frame of the terminal body, the upper left microphone and the lower right microphone are connected through a wire, the intersection point of the wires between the upper right microphone and the lower left microphone is located in the center of the terminal, and the four microphones 11 are centrosymmetric.
In other possible embodiments, when the terminal includes six microphones, in addition to two microphones respectively disposed on the upper frame and the lower frame of the body, one microphone may be disposed on each of the left frame and the lower frame of the body.
In order to process the audio signals, each microphone 11 in the terminal is connected to an AI audio processor 12 (for example, connected to the AI audio processor 12 through a bus), and the AI audio processor is configured to perform noise reduction and stereo processing on the audio signals collected by each microphone. Where noise reduction processing is used to reduce or even eliminate (ambient) noise signals in the audio signal and stereo processing is used to enhance the stereo effect when reproducing the audio signal.
In a possible implementation manner, during recording or communication, the terminal opens the microphone 11 to collect audio signals, transmits the collected audio signals to the AI audio processor 12, and the AI audio processor 12 processes the audio signals.
Alternatively, the AI audio processor 12 is an AP-independent Neural-Network Processing Unit (NPU), and the AI audio processor 12 has a deep learning function.
In a possible implementation manner, the AI audio processor includes an AI noise reduction algorithm, and the AI noise reduction algorithm is used for performing noise reduction processing on each path of audio signal; and/or, the AI audio processor includes a stereo enhancement algorithm for enhancing the audio signal from the specified bearing.
The terminal includes, in addition to the microphone and the AI audio processor, a display module, a power module, a sensor module, a speaker, a communication module, and the like, and the embodiment of the present application does not limit specific components included in the terminal.
In summary, in the embodiment, at least two microphones are arranged in the different-direction array of the terminal body, and the microphones in different directions are used to collect audio signals in different directions, so that an AI audio processor connected to each microphone is used to perform noise reduction processing and stereo processing on the audio signals; when the terminal provided by the embodiment of the disclosure is used for recording, the AI audio processor which is independently arranged is used for carrying out noise reduction treatment on the audio signal, the audio noise reduction effect is optimized, meanwhile, the array microphone is used for carrying out sound reception, and the AI audio processor is used for carrying out stereo treatment, so that the stereo perception of sound is enhanced, the effect of stereo recording is achieved, and the recording quality can be improved compared with the related technology.
Referring to fig. 2, a schematic structural diagram of a terminal according to an exemplary embodiment of the present disclosure is shown. The terminal may be a smart phone, a tablet computer, a personal portable computer, and the like, which is not limited in the embodiments of the present disclosure. The terminal includes: at least two microphones 11 provided on the terminal body, an AI audio processor 12, an AP13, and a memory 14.
The AI audio processor 12 is connected to the AP13, and the AP13 is configured to send an audio processing instruction to the AI audio processor 12 and/or receive processed audio data transmitted by the AI audio processor 12.
Optionally, the audio processing instruction is used to indicate an audio signal processing mode.
In a possible embodiment, in a predetermined recording mode (such as a professional recording mode), the AP13 sends a first audio processing instruction to the AI audio processor 12, instructing the AI audio processor 12 to perform noise reduction and stereo processing on the audio signals collected by the microphones 11, and accordingly, the AI audio processor 12 sends the processed audio data to the AP13, and the AP13 stores the audio data. In a non-scheduled recording mode (e.g., normal recording mode), in order to reduce the power consumption of the terminal, the AP13 sends a second audio processing instruction to the AI audio processor 12, instructing the AI audio processor 12 to perform normal processing (e.g., dual-microphone noise reduction) on the audio signal only.
For the process of processing the audio signal by the AI audio processor 12, optionally, the AI audio processor 12 is configured to separate the noise signal and the human voice signal in each audio signal by an AI noise reduction algorithm. Optionally, the AI audio processor 12 filters the separated noise signal, and only retains the voice signal, so that the voice in the recording is purer; alternatively, the AI audio processor 12 performs predetermined processing (such as reducing the signal strength of the noise signal) on the separated noise signal and adds the processed noise signal to the human voice signal, so that the human voice in the recording is more natural.
Optionally, the AI audio processor 12 is further configured to determine, according to the signal strength of each path of voice signal and the position of each microphone on the terminal body, position information corresponding to the voice signal.
In the pickup process, the signal strength of the human voice signals collected by the microphones at different positions is different, wherein the closer the sound source is to the microphone, the stronger the signal strength of the human voice signals collected by the microphone is, and conversely, the farther the sound source is from the microphone, the weaker the signal strength of the human voice signals collected by the microphone is. In one possible embodiment, therefore, the AI audio processor 12 determines the microphone that acquires the strongest vocal sound signal according to the signal strength of each of the vocal sound signals, and determines the azimuth information of the vocal sound signal according to the installation azimuth of the microphone.
In an illustrative example, as shown in fig. 2, when the signal intensity of the human voice signal collected by the upper left microphone 11 is greater than the signal intensity of the human voice signal collected by the upper right microphone 11, the AI audio processor 12 determines that the azimuth information of the human voice signal is the left.
Optionally, the audio data sent by the AI audio processor 12 to the AP13 further includes the azimuth information, and accordingly, the AP13 stores the audio data including the azimuth information in the memory 14. And the subsequent terminal controls the loudspeaker of the corresponding sound channel to analyze and play the audio according to the audio data and the corresponding azimuth information, thereby realizing the stereo effect.
In order to further improve the recording quality and enhance the designated voice, in one possible embodiment, the AI audio processor 12 is connected to the memory 14, and the memory 14 stores a sample voice signal, which may be a voice signal collected by the terminal through a microphone when a voice assistant is used to perform a voice call or send a voice message. For the sample voice signal stored in the memory 14, the AI audio processor 12 performs voiceprint feature extraction on the sample voice signal to obtain a sample voiceprint feature.
When the AI audio processor 12 processes the audio signal, the sample voiceprint feature is feature-matched with the voiceprint feature of the vocal signal in the audio signal, and the matched vocal signal is determined as the target vocal signal. For the matched target vocal signal, the AI audio processor 12 performs vocal enhancement processing thereon to highlight the target vocal signal in the recording. Wherein, the voice enhancement processing mode comprises at least one of voice enhancement and voice beautification processing.
In a possible application scenario, the AI audio processor extracts the voiceprint characteristics of the designated user, identifies the vocal signal of the designated user when recording is subsequently performed, and performs vocal enhancement, thereby highlighting the voice of the designated user in the recording.
As for the connection mode between the microphone 11 and the AI audio processor 12, optionally, when the microphone 11 is an analog microphone, the analog microphone is connected to the AI audio processor 12 through an ADC because the audio signal output by the analog microphone is an analog audio signal; when the microphone 11 is a digital microphone, since the audio signal output from the digital microphone is a digital audio signal, the digital microphone is directly connected to the AI audio processor 12.
Optionally, the ADC between the analog microphone and the AI audio processor 12 is a dual-channel ADC, where one channel of ADC is used to perform low gain on the audio signal, the other channel of ADC is used to perform high gain on the audio signal, and the audio signals output by the two channels of ADCs are integrated, so as to improve the dynamic range of recording.
It should be noted that the ADC may be disposed between the analog microphone and the AI audio processor 12, or may be integrated in the AI audio processor 12, that is, the AI audio processor 12 has an analog-to-digital conversion function, which is not limited in this embodiment.
In this embodiment, the AI audio processor determines the direction information corresponding to the vocal signals according to the signal strength of each path of the vocal signals and the direction of each microphone on the terminal body, so that the terminal can play audio through the speaker in the corresponding direction according to the direction information when subsequently playing the recording, thereby achieving the effect of playing stereo.
In addition, in this embodiment, the AI audio processor extracts the sample voiceprint feature according to the sample voiceprint signal stored in the memory, thereby performing the voiceprint enhancement on the specified voiceprint based on the sample voiceprint feature, thereby improving the recording quality of the specified voiceprint.
Referring to fig. 3, a flowchart of a method of an audio processing method according to an exemplary embodiment of the present disclosure is shown, where the present embodiment takes the method as an example for a terminal provided in the foregoing embodiment, and the method may include the following steps:
in step 301, audio signals collected by each microphone on the terminal body are acquired.
In a possible implementation manner, when a voice call instruction and/or a recording instruction is received, the terminal starts microphones arranged in the array, acquires audio signals of various directions through the microphones, sends the acquired audio signals to the AI audio processor, and processes the audio signals by the AI audio processor. When the microphone is a digital microphone, the audio signal is a digital audio signal, and when the microphone is an analog microphone, the audio signal is a digital audio signal obtained by converting the analog audio signal.
In step 302, noise reduction processing is performed on each audio signal by an AI noise reduction algorithm to obtain a vocal signal in the audio signal, and the AI noise reduction algorithm is used to separate the noise signal and the vocal signal in the audio signal.
In a possible implementation manner, the terminal performs noise reduction processing on each channel of audio signal through an AI noise reduction algorithm in the AI audio processor to separate a human voice signal and a noise signal therein.
Optionally, the AI noise reduction algorithm is an algorithm obtained by training a neural network, and is used for implementing human voice noise separation according to the noise and the sound characteristics (such as sound frequency) of the human voice.
In step 303, the direction information corresponding to the voice signal is determined according to the signal strength of each path of voice signal.
And for the separated voice signals, the terminal determines the azimuth information corresponding to the voice signals according to the signal intensity of each path of voice signals, and the azimuth information is used for indicating the azimuth of the voice signal corresponding to the sound source relative to the terminal.
Optionally, when each path of voice signal includes multiple voice signals (corresponding to multiple sound sources), the terminal distinguishes different voice signals according to the timbre of the voice signal, and determines the corresponding direction information of the various voice signals respectively.
In step 304, audio data is generated from the human voice signal and the orientation information.
Furthermore, the terminal generates audio data according to the separated human voice signal and the orientation information thereof, and when audio reproduction is subsequently performed according to the audio data, the terminal plays audio through a corresponding loudspeaker according to the orientation information, so that stereo playing is realized.
Optionally, in a recording scene, the terminal stores the audio data in the memory for subsequent playing; and under the voice call scene, the terminal sends the audio data to the opposite terminal equipment so that the opposite terminal equipment can analyze and play the audio data.
In summary, in this embodiment, audio signals acquired by each microphone on the terminal body are acquired, and an AI noise reduction algorithm is used to perform noise reduction processing on each path of audio signal, so as to obtain a voice signal in the audio signal, determine orientation information corresponding to the voice signal according to the signal strength of each path of voice signal, and generate audio data according to the voice signal and the orientation information; when the method provided by the embodiment is used for recording, the AI audio processor which is independently arranged is used for carrying out noise reduction on the audio signal, the audio noise reduction effect is optimized, meanwhile, the array microphone is used for receiving the sound, and the AI audio processor is used for carrying out stereo processing, so that the stereo perception of the sound is enhanced, the effect of stereo recording is achieved, and the recording quality can be improved compared with the related technology.
Referring to fig. 4, it shows a flowchart of a method of an audio processing method according to another exemplary embodiment of the present disclosure, where this embodiment takes the method as an example for use in a terminal provided in the foregoing embodiment, the method may include the following steps:
in step 401, audio signals collected by each microphone on the terminal body are acquired.
In step 402, noise reduction processing is performed on each audio signal through an AI noise reduction algorithm to obtain a vocal signal in the audio signal, and the AI noise reduction algorithm is used to separate a noise signal in the audio signal from the vocal signal.
The implementation of the above steps can refer to steps 301 to 302, and this embodiment is not described herein again.
In an illustrative example, in the terminal shown in fig. 1 or 2, the terminal performs noise reduction processing on four audio signals (respectively collected by four microphones) through an AI noise reduction algorithm, so as to obtain four voice signals, which are a first voice signal (collected by an upper left microphone), a second voice signal (collected by an upper right microphone), a third voice signal (collected by a lower left microphone), and a fourth voice signal (collected by a lower right microphone).
In step 403, the highest signal strength of each of the road voice signals is determined.
In a possible implementation manner, for each path of voice signals obtained by separation, the terminal obtains the signal strength of each path of voice signals, so as to determine the highest signal strength.
With reference to the example in step 402, the terminal obtains that the signal intensity of the first path of voice signal is 45dB, the signal intensity of the second path of voice signal is 40dB, the signal intensity of the third path of voice signal is 41dB, and the signal intensity of the fourth path of voice signal is 37dB, so as to determine that the highest signal intensity is 45dB.
In step 404, the microphone orientation of the target microphone is determined according to the highest signal strength, and the target microphone is the microphone collecting the voice signal with the highest signal strength.
Because the signal intensity of the human voice signal picked up by the microphone closest to the sound source is strongest, the terminal can determine the microphone which picks up the human voice signal with the highest signal intensity as the target microphone closest to the sound source, and further determine the azimuth information of the human voice signal according to the azimuth of the microphone of the target microphone on the terminal body.
With reference to the example in the above step, the terminal determines that the target microphone is the upper left microphone according to the strongest signal strength, that is, the microphone is oriented at the upper left.
Step 405, determining the microphone position as the position information corresponding to the human voice signal.
Optionally, the terminal determines the azimuth information of the human voice signal according to the corresponding relationship between the azimuth of the microphone and the azimuth information.
With reference to the example in the above step, the terminal determines that the direction corresponding to the human voice signal is the left side according to the upper left of the microphone direction of the target microphone.
In other possible embodiments, the terminal may simulate, according to the signal strength of each path of the vocal signal and the microphone position of each microphone, a sound source position corresponding to the vocal signal through an AI algorithm, so as to determine the position information of the vocal signal, which is not limited in this embodiment.
Step 406, a sample human voice signal is obtained.
In one possible embodiment, the sample voice signal is a voice signal collected by the terminal automatically through a microphone when a voice assistant is used, a voice call is made, or a voice message is sent. When the sample voice signal is obtained in the mode, the subsequent terminal can perform voice enhancement on the voice of the terminal owner.
In another possible implementation, the sample human voice signal is a human voice signal which is input after receiving a triggering operation of a voice input control in the sample human voice input interface. When the sample voice signal is obtained in the mode, the subsequent terminal performs voice enhancement on the set voice of the appointed user.
Illustratively, as shown in fig. 5, a voice entry control 52 is displayed in the sample voice entry interface 51, and when a click operation on the voice entry control 52 is received, the terminal collects a sample voice signal when the user reads the specified text aloud through a microphone. After the voice input is completed, the sample voice input interface 51 displays the user setting control 53, and obtains the input user identifier "zhaoku" through the user setting control 53.
Step 407, extracting the sample voiceprint characteristics of the sample human voice signal.
In a possible implementation manner, for the obtained sample voice signal, the terminal performs voiceprint feature extraction on the sample voice signal through the AI audio processor, so as to obtain a sample voiceprint feature, so as to extract the voice of the specified user based on the sample voiceprint feature in the following.
The sample voiceprint feature may include Mel-Frequency Cepstrum Coefficient (MFCC) or Perceptual Linear Prediction (PLP), which is not limited in this embodiment of the present invention.
Optionally, the terminal stores the extracted sample voiceprint features in association with the user identifier, so as to be directly used in the subsequent voice enhancement.
And 408, performing voice enhancement processing on a target voice signal in the voice signal according to the sample voice print characteristics, wherein the voice print characteristics of the target voice signal are matched with the sample voice print characteristics, and the voice enhancement processing comprises at least one of volume enhancement and voice beautification processing.
In a possible implementation manner, before recording, when a command for starting voice enhancement is received, the terminal displays a user identifier of at least one candidate voice enhancement user on a recording interface, and receives a selection signal of the user identifier. When recording is carried out, the terminal obtains the sample voiceprint characteristics corresponding to the selected user identification, so that the target voiceprint signals in the voiceprint signals are identified according to the sample voiceprint characteristics, and then the target voiceprint signals are subjected to voiceprint enhancement processing.
Illustratively, as shown in fig. 5, the recording interface 61 displays three user identifiers 62, which respectively correspond to the "total" user, the "total" user and the "me" user of the candidate voice enhancement user, and receives selection signals of the "total" user and the "me" user identifiers 62 corresponding to the candidate voice enhancement user. When the click operation of the recording control 63 is received, the terminal obtains the sample voiceprint characteristics corresponding to Zhang always and My, so that the voice signals of Zhang always and My are subjected to voice enhancement processing in the recording process.
The method for performing the voice enhancement processing on the target voice signal includes at least one of a volume enhancement and a voice beautification processing, for example, increasing the volume of the target voice signal by 10dB. Of course, in addition to the above voice enhancement processing manner, the terminal may also perform voice enhancement processing in other manners, which is not limited in this embodiment.
It should be noted that there is no strict time sequence between the steps 403 to 405 and the steps 406 to 408, and this embodiment is described by taking the steps 403 to 405 and the steps 406 to 408 as an example, but the invention is not limited thereto.
In step 409, a predetermined process is performed on the noise signal in the audio signal to obtain a processed noise signal.
Optionally, when the audio data is generated, the terminal adds noise properly, so that the recording is more natural, and discontinuous or incoherent voice caused by too pure voice is avoided.
In a possible implementation manner, for the noise signal extracted by the AI noise reduction algorithm in the above step 402, the terminal performs a predetermined process on the noise signal to obtain a processed noise signal, so as to generate audio data according to the human voice signal, the direction information, and the processed noise signal.
Alternatively, the predetermined processing of the noise signal may be to reduce the signal strength of the noise signal. Such as to uniformly reduce the signal strength of the noise signal to 5dB.
In step 410, audio data is generated from the human voice signal, the orientation information, and the processed noise signal.
Furthermore, the terminal adds the processed noise signal into the generated audio data, so that the excessive purification of the voice is avoided on the premise of ensuring the clearness of the voice, the generated audio is more natural, and the audio quality is improved.
In this embodiment, the terminal determines the orientation information corresponding to the voice signal according to the signal strength of each path of voice signal and the orientation of each microphone on the terminal body, so that when the recording is played subsequently, the terminal can play audio through the loudspeaker in the corresponding orientation according to the orientation information, thereby achieving the effect of playing stereo.
In addition, in this embodiment, the terminal extracts the sample voiceprint features according to the sample voiceprint signals, so that the designated voiceprint is subjected to voiceprint enhancement based on the sample voiceprint features, and the recording quality of the designated voiceprint is improved.
In addition, the terminal carries out predetermined processing to the noise signal among the audio signal to add the noise signal after handling when generating audio data, can avoid leading to the problem that pronunciation is interrupted or not consistent because of the voice is too pure, make the recording more natural.
Referring to fig. 7, a block diagram of an audio processing apparatus according to an exemplary embodiment of the disclosure is shown, where the audio processing apparatus may be implemented as all or a part of the terminal, and the audio processing apparatus may include:
a first obtaining module 710 configured to obtain audio signals collected by each microphone on the terminal body;
a noise reduction module 720, configured to perform noise reduction processing on each path of the audio signal through an AI noise reduction algorithm to obtain a vocal signal in the audio signal, where the AI noise reduction algorithm is used to separate a noise signal from the vocal signal in the audio signal;
the determining module 730 is configured to determine, according to the signal strength of each path of the voice signal, azimuth information corresponding to the voice signal;
a first generating module 740 configured to generate audio data from the human voice signal and the orientation information.
Optionally, the determining module 730 includes:
a first determining submodule configured to determine the highest signal strength of each of the voice signals;
a second determination submodule configured to determine a microphone orientation of a target microphone from the highest signal strength, the target microphone being a microphone that acquired the human voice signal of the highest signal strength;
a third determining submodule configured to determine the microphone orientation as orientation information corresponding to the human voice signal.
Optionally, the apparatus further comprises:
a second acquisition module configured to acquire a sample human voice signal;
an extraction module configured to extract sample voiceprint features of the sample human voice signal;
a voice enhancement module configured to perform voice enhancement processing on a target voice signal in the voice signals according to the sample voiceprint features, wherein the voiceprint features of the target voice signal are matched with the sample voiceprint features, and the voice enhancement processing includes at least one of volume enhancement and voice beautification processing.
Optionally, the apparatus further includes:
the processing module is configured to perform predetermined processing on a noise signal in the audio signal to obtain a processed noise signal;
a second generation module configured to generate the audio data from the vocal signals, the orientation information, and the processed noise signals.
In summary, in this embodiment, audio signals acquired by each microphone on the terminal body are acquired, and an AI noise reduction algorithm is used to perform noise reduction processing on each path of audio signal, so as to obtain a voice signal in the audio signal, determine orientation information corresponding to the voice signal according to the signal strength of each path of voice signal, and generate audio data according to the voice signal and the orientation information; when the method provided by the embodiment is used for recording, the AI audio processor which is independently arranged is used for carrying out noise reduction on the audio signal, the audio noise reduction effect is optimized, meanwhile, the array microphone is used for receiving the sound, and the AI audio processor is used for carrying out stereo processing, so that the stereo perception of the sound is enhanced, the effect of stereo recording is achieved, and the recording quality can be improved compared with the related technology.
The disclosed embodiments also provide a computer-readable storage medium storing at least one instruction for execution by a processor to implement the audio processing method as described in the above aspect.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. A terminal, characterized in that the terminal comprises: the terminal comprises at least two microphones, an Artificial Intelligence (AI) audio processor, an Application Processor (AP) and a memory, wherein the microphones are arranged on a terminal body;
the terminal is used for displaying the user identification of at least one candidate voice enhancement user on the recording interface and receiving a selection signal of the user identification when receiving a voice enhancement starting instruction;
the at least two microphone arrays are arranged in different directions of the terminal body, the at least two microphones are used for acquiring audio signals in different directions, and the array arrangement modes of the at least two microphones on the terminal body comprise linear array arrangement and area array arrangement, wherein when the linear array arrangement is adopted, the at least two microphones are positioned on the same straight line, and when the area array arrangement is adopted, the directions of the at least two microphones on the terminal body are in central symmetry; the distance between adjacent microphones on the same straight line is greater than a distance threshold;
the AP is used for sending a first audio processing instruction to the AI audio processor in a preset recording mode;
the memory stores sample human voice signals;
the AI audio processor is connected with each microphone, and is used for performing noise reduction processing and stereo processing on the audio signals acquired by each microphone after receiving the first audio processing instruction, wherein the noise reduction processing is used for separating noise signals and voice signals in the audio signals, the stereo processing is used for extracting sample voiceprint characteristics of the sample voice signals, determining the highest signal strength of each path of target voice signals according to the sample voiceprint characteristics corresponding to the selected user identifier, and determining the microphone orientation of a target microphone according to the highest signal strength, the target microphone is a microphone acquiring the target voice signals with the highest signal strength, the microphone orientation is determined as orientation information corresponding to the target voice signals, and audio data is generated according to the target voice signals and the orientation information, and the voiceprint characteristics of the target voice signals are matched with the sample voiceprint characteristics corresponding to the selected user identifier;
the AP is further used for sending a second audio processing instruction to the AI audio processor under the non-predetermined recording mode, and instructing the AI audio processor to carry out conventional processing on the audio signal;
wherein, the microphone is analog microphone or digital microphone, the audio signal of analog microphone output is analog audio signal, analog microphone with link to each other through double-circuit analog-to-digital converter ADC between the AI audio processor, one of the double-circuit ADC is arranged in right audio signal carries out low gain, and another way is arranged in right audio signal carries out high gain, audio data is based on the audio signal of double-circuit ADC output integrates and obtains.
2. The terminal of claim 1,
the AI audio processor is used for separating the noise signals and the human voice signals in each path of audio signals through an AI noise reduction algorithm;
and the AI audio processor is also used for determining the direction information corresponding to the voice signals according to the signal intensity of each path of voice signals and the direction of each microphone on the terminal body.
3. A terminal according to claim 1, wherein the microphone is a microelectromechanical systems MEMS microphone.
4. An audio processing method, for use in a terminal according to any of claims 1 to 3, the method comprising:
when receiving a command for starting voice enhancement, displaying a user identifier of at least one candidate voice enhancement user on a recording interface, and receiving a selection signal of the user identifier;
acquiring audio signals acquired by at least two microphones which are arranged on a terminal body in an array manner in different directions, wherein the array arrangement manner of the at least two microphones on the terminal body comprises linear array arrangement and area array arrangement, when the linear array arrangement is adopted, the at least two microphones are positioned in the same straight line, and when the area array arrangement is adopted, the directions of the at least two microphones on the terminal body are centrosymmetric; the distance between adjacent microphones on the same straight line is greater than a distance threshold;
performing noise reduction processing on each path of audio signal through an AI noise reduction algorithm in a preset recording mode to obtain a human voice signal in the audio signal, wherein the AI noise reduction algorithm is used for separating a noise signal in the audio signal from the human voice signal; extracting sample voiceprint characteristics of the sample voice signals in the memory, and determining the highest signal intensity of each path of target voice signals according to the sample voiceprint characteristics corresponding to the selected user identification; determining a microphone orientation of a target microphone according to the highest signal strength, wherein the target microphone is a microphone for collecting a target human voice signal with the highest signal strength; determining the microphone azimuth as azimuth information corresponding to the target voice signal, wherein the voiceprint feature of the target voice signal is matched with the sample voiceprint feature corresponding to the selected user identification; generating audio data according to the human voice signal and the azimuth information, wherein the microphone is an analog microphone or a digital microphone, the audio signal output by the analog microphone is an analog audio signal, the analog microphone is connected with an AI audio processor through a two-way ADC, one way of the two-way ADC is used for carrying out low gain on the audio signal, the other way of the two-way ADC is used for carrying out high gain on the audio signal, and the audio data is obtained by integrating the audio signal output by the two-way ADC;
and in the non-predetermined recording mode, performing conventional processing on the audio signal.
5. The method according to claim 4, wherein after performing noise reduction processing on each of the audio signals by an AI noise reduction algorithm in a predetermined recording mode to obtain a human voice signal in the audio signals, the method further comprises:
acquiring the sample human voice signal;
extracting the sample voiceprint features of the sample human voice signal;
and performing voice enhancement processing on a target voice signal in the voice signals according to the sample voice print characteristics, wherein the voice print characteristics of the target voice signal are matched with the sample voice print characteristics, and the voice enhancement processing comprises at least one of volume enhancement and voice beautification processing.
6. The method of claim 4, wherein after determining the microphone bearing as bearing information corresponding to the target vocal signal, the method further comprises:
performing predetermined processing on a noise signal in the audio signal to obtain a processed noise signal;
and generating the audio data according to the human voice signal, the azimuth information and the processed noise signal.
7. An audio processing apparatus, characterized in that the apparatus is used in a terminal according to any of claims 1 to 3, the apparatus comprising:
the functional module is configured to display the user identification of at least one candidate human voice enhancement user on the recording interface when receiving a command for starting human voice enhancement, and receive a selection signal of the user identification;
the first acquisition module is configured to acquire audio signals acquired by at least two microphones which are arranged in different directions on a terminal body in an array mode, wherein the arrangement modes of the at least two microphones on the terminal body comprise linear array arrangement and area array arrangement, when the linear array arrangement is adopted, the at least two microphones are positioned on the same straight line, and when the area array arrangement is adopted, the directions of the at least two microphones on the terminal body are in central symmetry; the distance between adjacent microphones on the same straight line is greater than a distance threshold;
the noise reduction module is configured to perform noise reduction processing on each path of audio signal through an AI noise reduction algorithm in a preset recording mode to obtain a human voice signal in the audio signal, wherein the AI noise reduction algorithm is used for separating the noise signal and the human voice signal in the audio signal;
the determining module is configured to extract sample voiceprint characteristics of the sample voice signals in the memory under the preset recording mode, and determine the highest signal intensity of each path of target voice signals according to the sample voiceprint characteristics corresponding to the selected user identification; determining a microphone orientation of a target microphone according to the highest signal strength, wherein the target microphone is a microphone for collecting a target human voice signal with the highest signal strength; determining the microphone azimuth as azimuth information corresponding to the target voice signal, wherein the voiceprint feature of the target voice signal is matched with the sample voiceprint feature corresponding to the selected user identification;
a first generating module, configured to generate audio data according to the human voice signal and the orientation information in the predetermined recording mode, where the microphone is an analog microphone or a digital microphone, an audio signal output by the analog microphone is an analog audio signal, the analog microphone is connected to the AI audio processor through a dual-channel ADC, one channel of the dual-channel ADC is used for performing low gain on the audio signal, and the other channel of the dual-channel ADC is used for performing high gain on the audio signal, and the audio data is obtained by integrating audio signals output by the dual-channel ADC;
a functional module configured to perform normal processing on the audio signal in a non-predetermined recording mode.
8. A computer-readable storage medium having stored thereon at least one instruction for execution by a processor to implement the audio processing method of any of claims 4 to 6.
CN201910426098.XA 2019-05-21 2019-05-21 Audio processing method, device, terminal and storage medium Active CN111988705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910426098.XA CN111988705B (en) 2019-05-21 2019-05-21 Audio processing method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910426098.XA CN111988705B (en) 2019-05-21 2019-05-21 Audio processing method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111988705A CN111988705A (en) 2020-11-24
CN111988705B true CN111988705B (en) 2023-01-03

Family

ID=73436217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910426098.XA Active CN111988705B (en) 2019-05-21 2019-05-21 Audio processing method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111988705B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115426582B (en) * 2022-11-06 2023-04-07 江苏米笛声学科技有限公司 Earphone audio processing method and device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6271780B1 (en) * 1998-10-08 2001-08-07 Cirrus Logic, Inc. Gain ranging analog-to-digital converter with error correction
CN101924525B (en) * 2009-06-11 2016-06-22 应美盛股份有限公司 High performance voice frequency amplifying circuit
US20130129117A1 (en) * 2011-11-21 2013-05-23 Henrik Thomsen Audio amplification circuit
CN103138687B (en) * 2011-11-22 2017-04-12 应美盛股份有限公司 Distortion inhibition in high level audio frequency amplifying circuit
CN108231085A (en) * 2016-12-14 2018-06-29 杭州海康威视数字技术股份有限公司 A kind of sound localization method and device
CN107040843B (en) * 2017-03-06 2021-05-18 联想(北京)有限公司 Method for acquiring same sound source through two microphones and acquisition equipment
CN107293305A (en) * 2017-06-21 2017-10-24 惠州Tcl移动通信有限公司 It is a kind of to improve the method and its device of recording quality based on blind source separation algorithm
CN108717178A (en) * 2018-04-12 2018-10-30 福州瑞芯微电子股份有限公司 A kind of sound localization method and device based on neural network
CN108737615A (en) * 2018-06-27 2018-11-02 努比亚技术有限公司 microphone reception method, mobile terminal and computer readable storage medium
CN108877823B (en) * 2018-07-27 2020-12-18 三星电子(中国)研发中心 Speech enhancement method and device
CN109655816A (en) * 2018-11-30 2019-04-19 恩平市声动电子科技有限公司 Obtain the method, device and equipment terminal of sound source position

Also Published As

Publication number Publication date
CN111988705A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN102056036B (en) Reproducing device, headphone and reproducing method
CN108159702B (en) Multi-player voice game processing method and device
US20200186912A1 (en) Audio headset device
WO2022033556A1 (en) Electronic device and speech recognition method therefor, and medium
CN103456305A (en) Terminal and speech processing method based on multiple sound collecting units
CN110349582B (en) Display device and far-field voice processing circuit
WO2021244056A1 (en) Data processing method and apparatus, and readable medium
EP3915277A1 (en) Signal processing device, system and method for processing audio signals
US10607625B2 (en) Estimating a voice signal heard by a user
US20240096343A1 (en) Voice quality enhancement method and related device
CN111540370A (en) Audio processing method and device, computer equipment and computer readable storage medium
CN115482830A (en) Speech enhancement method and related equipment
CN111081275B (en) Terminal processing method and device based on sound analysis, storage medium and terminal
US11589173B2 (en) Hearing aid comprising a record and replay function
CN111988705B (en) Audio processing method, device, terminal and storage medium
CN105979469B (en) recording processing method and terminal
CN116343756A (en) Human voice transmission method, device, earphone, storage medium and program product
CN112599144A (en) Audio data processing method, audio data processing apparatus, medium, and electronic device
CN112420063A (en) Voice enhancement method and device
CN114220454B (en) Audio noise reduction method, medium and electronic equipment
CN113299309A (en) Voice translation method and device, computer readable medium and electronic equipment
CN113409809B (en) Voice noise reduction method, device and equipment
CN115835079A (en) Transparent transmission mode switching method and switching device
CN116320144B (en) Audio playing method, electronic equipment and readable storage medium
CN114093380B (en) Voice enhancement method, electronic equipment, chip system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant