CN110021307A - Audio method of calibration, device, storage medium and electronic equipment - Google Patents

Audio method of calibration, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110021307A
CN110021307A CN201910273077.9A CN201910273077A CN110021307A CN 110021307 A CN110021307 A CN 110021307A CN 201910273077 A CN201910273077 A CN 201910273077A CN 110021307 A CN110021307 A CN 110021307A
Authority
CN
China
Prior art keywords
processor
audio
audio signal
noise reduction
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910273077.9A
Other languages
Chinese (zh)
Other versions
CN110021307B (en
Inventor
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910273077.9A priority Critical patent/CN110021307B/en
Publication of CN110021307A publication Critical patent/CN110021307A/en
Application granted granted Critical
Publication of CN110021307B publication Critical patent/CN110021307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the present application discloses a kind of audio method of calibration, device, storage medium and electronic equipment, wherein, electronic equipment includes that electronic equipment includes processor, dedicated voice identification chip and two microphones, and the power consumption of dedicated voice identification chip is less than the power consumption of processor, when the processor is in the dormant state, external audio signal is verified using the dedicated voice identification chip of low-power consumption, the wake-up processor if verification passes through, noise reduction audio signal is obtained to two external audio signal noise reductions by processor, the noise reduction audio signal is verified by processor again, obtain corresponding check results.Thereby, it is possible to exclude the interference of external noise, to more accurately be verified to audio signal.

Description

Audio method of calibration, device, storage medium and electronic equipment
Technical field
This application involves voice processing technology fields, and in particular to a kind of audio method of calibration, device, storage medium and electricity Sub- equipment.
Background technique
Currently, being verified by audio, user can say voice in the case where inconvenient directly manipulation electronic equipment and refer to It enables and carrys out controlling electronic devices.However, in the actual use environment, often there are various noises, so that electronic equipment is difficult to accurately Verification input audio signal.
Summary of the invention
The embodiment of the present application provides a kind of audio method of calibration, device, storage medium and electronic equipment, can be improved electricity The accuracy of sub- equipment verification audio signal.
In a first aspect, the embodiment of the present application provides a kind of audio method of calibration, it is applied to electronic equipment, the electronics is set Standby includes processor, dedicated voice identification chip and two microphones, and the power consumption of the dedicated voice identification chip is less than institute The power consumption of processor is stated, the audio method of calibration includes:
The processor in a dormant state when, obtain external audio signal by any microphone, and will described in Audio signal is supplied to the dedicated voice chip;
The audio signal is verified by the dedicated voice chip, and wakes up the processor when verification passes through, with And the dedicated voice chip suspend mode is controlled after waking up the processor;
By two audio signals outside two microphones acquisitions, and described two audio signals are supplied to the place Manage device;
Noise reduction audio signal is obtained by the described two audio signals of processor noise reduction, and verifies the noise reduction audio Signal obtains check results.
Second aspect, the embodiment of the present application provide a kind of audio calibration equipment, are applied to electronic equipment, and the electronics is set Standby includes processor, dedicated voice identification chip and two microphones, and the audio calibration equipment includes:
First acquisition module, for the processor in a dormant state when, by any microphone obtain outside Audio signal, and the audio signal is supplied to the dedicated voice chip;
First correction verification module, for verifying the audio signal by the dedicated voice chip, and when verification passes through The processor is waken up, and controls the dedicated voice chip suspend mode after waking up the processor;
Second acquisition module, for two audio signals outside being obtained by two microphones, and by described two sounds Frequency signal is supplied to the processor;
Second correction verification module, for obtaining noise reduction audio signal by the described two audio signals of processor noise reduction, And the noise reduction audio signal is verified, obtain check results.
The third aspect, the embodiment of the present application provide a kind of storage medium, are stored thereon with computer program, when the meter Calculation machine program is when including the electronic equipment operation of processor, dedicated voice identification chip and two microphones, so that the electricity Sub- equipment executes the step in audio method of calibration provided by the embodiments of the present application.
Fourth aspect, the embodiment of the present application also provides a kind of electronic equipment, the electronic equipment includes audio collection list Member, processor, dedicated voice identification chip and two microphones, and the power consumption of the dedicated voice identification chip is less than the place Manage the power consumption of device, wherein
The electronic equipment includes audio collection unit, processor, dedicated voice identification chip and screen, and described dedicated The power consumption of voice recognition chip is less than the power consumption of the processor, wherein
The audio collection unit be used for the processor in a dormant state when, pass through any microphone obtain it is external Audio signal, and the audio signal is supplied to the dedicated voice chip;
The dedicated voice identification chip wakes up the processing when verification passes through for verifying the audio signal Device, and the suspend mode after waking up the processor;
The audio collection unit is used for after waking up the processor, passes through two sounds outside two microphones acquisitions Frequency signal, and described two audio signals are supplied to the processor;
The processor obtains noise reduction audio signal for the described two audio signals of noise reduction, and verifies the noise reduction audio Signal obtains check results.
In the embodiment of the present application, electronic equipment includes processor, dedicated voice identification chip and two microphones, and dedicated The power consumption that the power consumption of voice recognition chip is less than processor utilizes the nomenclature of low-power consumption when the processor is in the dormant state Sound identification chip verifies external audio signal, if verifying the wake-up processor if, by processor to external two A audio signal noise reduction obtains noise reduction audio signal, then verifies the noise reduction audio signal by processor, obtains corresponding verification knot Fruit.Thereby, it is possible to exclude the interference of external noise, to more accurately be verified to audio signal.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is a flow diagram of audio method of calibration provided by the embodiments of the present application.
Fig. 2 is the setting position view of two microphones in the embodiment of the present application.
Fig. 3 is the signal for carrying out noise suppressed in the embodiment of the present application according to two audio signals of two microphones acquisition Figure.
Fig. 4 is the flow diagram that training vocal print feature extracts model in the embodiment of the present application.
Fig. 5 is the schematic diagram of the sound spectrograph extracted in the embodiment of the present application.
Fig. 6 is another flow diagram of audio method of calibration provided by the embodiments of the present application.
Fig. 7 is the structural schematic diagram of audio calibration equipment provided by the embodiments of the present application.
Fig. 8 is the structural schematic diagram of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Schema is please referred to, wherein identical component symbol represents identical component, the principle of the application is to implement one It is illustrated in computing environment appropriate.The following description be based on illustrated by the application specific embodiment, should not be by It is considered as limitation the application other specific embodiments not detailed herein.
The embodiment of the present application provides a kind of audio method of calibration first, and the executing subject of the audio method of calibration can be this Apply for the electronic equipment that embodiment provides, which includes processor, dedicated voice identification chip and two microphones, and The power consumption of dedicated voice identification chip is less than the power consumption of processor, which can be smart phone, tablet computer, palm The equipment configured with processor and with processing capacity such as computer, laptop or desktop computer.
Fig. 1 is please referred to, Fig. 1 is the flow diagram of audio method of calibration provided by the embodiments of the present application.Audio verification Method is applied to electronic equipment provided by the present application, which includes processor, dedicated voice identification chip and two wheats Gram wind, as shown in Figure 1, the process of audio method of calibration provided by the embodiments of the present application can be such that
In 101, when the processor is in the dormant state, by the audio signal outside the acquisition of any microphone, and will Audio signal is supplied to dedicated voice chip.
It should be noted that the dedicated voice identification chip in the embodiment of the present application is designed for the purpose of speech recognition Special chip, such as the purpose of voice and design digital signal processing chip, for the purpose of voice and design it is dedicated IC chip etc. has lower power consumption compared to general processor.Wherein, dedicated voice identification chip, processing Device and audio collection unit arbitrarily establish communication connection by communication bus (such as I2C bus) therebetween, realize data Interaction.
In the embodiment of the present application, processor suspend mode when the screen of electronic equipment is in and puts out screen state, and dedicated voice is known The suspend mode when screen is in bright screen state of other chip.In addition, two microphones included by electronic equipment can be built-in wheat Gram wind, is also possible to external microphone (can be wired microphone, be also possible to wireless microphone).
Wherein, when the processor is in the dormant state (dedicated voice identification chip is in wake-up states), electronic equipment is logical It crosses any microphone to be acquired external sound, it is assumed that microphone is simulation microphone, then the sound that will collect simulation Frequency signal needs the audio signal that will be simulated to carry out analog-to-digital conversion at this time, obtains digitized audio signal, is used for subsequent place Reason.For example, electronic equipment can be after collecting external analog audio signal by microphone, with the sample frequency of 16KHz The analog audio signal is sampled, digital audio and video signals are obtained.
One of ordinary skill in the art will appreciate that if microphone included by electronic equipment is digital microphone, Then digitized audio signal will be directly collected, no longer needs to carry out analog-to-digital conversion.
After collecting external audio signal, collected audio signal is supplied to dedicated voice by electronic equipment to be known Other chip.
In 102, audio signal, and the wake-up processor when verification passes through, Yi Ji are verified by dedicated voice chip The suspend mode of dedicated voice chip is controlled after wake-up processor.
In the embodiment of the present application, after it will collect external audio signal and be supplied to dedicated voice chip, electronics is set It is standby that the audio signal is further verified by the first checking algorithm run on dedicated voice chip, obtain check results.Wherein, Including but not limited to verify the text feature and/or vocal print feature of aforementioned audio signal.
Popular says, the text feature for verifying audio signal that is to say in verification audio signal whether include default wake up Word, as long as audio signal includes default wake-up word, i.e. the text feature of verification audio signal passes through, regardless of this, which is preset, wakes up word Whom said by.For example, audio signal has included pre-set user (for example, the owner or owner of electronic equipment license electronics The other users of equipment) setting default wake-up word, but the default word that wakes up is said by user A, rather than pre-set user, specially With voice recognition chip when based on the text feature of the first checking algorithm verification audio signal, verification is passed through.
And it verifies the text feature of audio signal and vocal print feature and whether that is to say in verification audio signal including default The default wake-up word that user says, if in audio signal including the default wake-up word that pre-set user is said, the text of audio signal Eigen and vocal print feature verification pass through, and otherwise verify and do not pass through.For example, audio signal has included the pre- of pre-set user setting If waking up word, and the default word that wakes up is said by pre-set user, then the text feature of the audio signal and vocal print feature verification Pass through;For another example, audio signal has included the default wake-up word or audio signal that the other users except pre-set user are said When not including the default wake-up word that any user says, then the text feature of the audio signal and vocal print feature fail verification (do not verify pass through in other words).
In the embodiment of the present application, electronic equipment by dedicated voice identification chip verify aforementioned audio signal pass through when, Preset interrupt signal is sent to processor, to wake up place by the communication connection between dedicated voice identification chip and processor Manage device.After wake-up processor, electronic equipment suspend mode dedicated voice identification chip, meanwhile, screen is switched to by chip status Bright screen state.
It should be noted that passing through if aforementioned audio signal does not verify, electronic equipment continues will be by any microphone The audio signal of the outside got is supplied to dedicated voice identification chip and is verified, until verification passes through.
In 103, by two audio signals outside two microphones acquisitions, and two audio signals are supplied to place Manage device.
Electronic equipment is after wake-up processor, when identical by two outside the synchronous acquisition of two microphones of setting Long audio signal, and two audio signals that will acquire are supplied to processor.
According to related description provided above, those of ordinary skill in the art are it should be understood that provided herein to the two of processor A audio signal is similarly digitized audio signal.
In 104, noise reduction audio signal is obtained by two audio signals of processor noise reduction, and verifies noise reduction audio letter Number, obtain check results.
In the embodiment of the present application, electronic equipment passes through after two audio signals that will acquire are supplied to processor Two audio signals of diamylose noise reduction algorithm noise reduction of processor operation obtain noise reduction audio signal.Wherein, double for which kind of is chosen Wheat noise reduction algorithm is not particularly limited in the embodiment of the present application, can be carried out according to actual needs by those of ordinary skill in the art It chooses, including but not limited to diamylose Wave beam forming noise reduction algorithm, diamylose blind source separating noise reduction algorithm etc.,
Electronic equipment further passes through processing after obtaining noise reduction audio signal by two audio signals of processor noise reduction Second checking algorithm of device operation verifies the noise reduction audio signal, obtains check results, wherein including but not limited to verify aforementioned The text feature and/or vocal print feature of noise reduction audio signal.When verification noise reduction audio signal passes through, electronic equipment can be into one Step executes the operation of the corresponding noise reduction audio signal, including but not limited to solution lock screen, starting voice assistant etc..
It should be noted that the first checking algorithm of dedicated voice identification chip operation and the second verification of processor operation Algorithm may be the same or different, and the embodiment of the present application is not particularly limited this.
From the foregoing, it will be observed that electronic equipment when the processor is in the dormant state, utilizes the special of low-power consumption in the embodiment of the present application External audio signal is verified with voice recognition chip, if verifying the wake-up processor if, by processor to outside Two audio signal noise reductions obtain noise reduction audio signal, then the noise reduction audio signal is verified by processor, obtains corresponding school Test result.Thereby, it is possible to exclude the interference of external noise, to more accurately be verified to audio signal.
In one embodiment, " noise reduction audio signal is obtained by two audio signals of processor noise reduction ", comprising:
(1) two audio signals are characterized by processor vectorization, obtains audio vector;
(2) voice signal is obtained by processor blind source separating audio vector, voice signal is set as noise reduction audio signal.
In the embodiment of the present application, electronic equipment can processor in a manner of blind source separating two audio signals of noise reduction Obtain noise reduction audio signal.
Wherein, electronic equipment passes through processor vectorization two audio signals of characterization first, obtains audio vector.For example, Assuming that two audio signals got are respectively x1And x2, then two audio signals are characterized by processor vectorization and obtain sound Frequency vector
Assuming that speech components in audio vector x are s1, noise component(s) s2, then speech components, noise component(s) and audio Relationship between vector can indicate are as follows:
Wherein, w indicates the separation for being used for blind source separating.
When obtaining noise signal and voice signal by processor blind source separating audio vector, electronic equipment obtains first To the separation for blind source separating audio vector, then based on the separation got, pass through processor blind source separating sound Frequency vector obtains noise signal (noise component(s) i.e. in audio vector) and voice signal (speech components i.e. in audio vector), The voice signal that component goes out is set as noise reduction audio signal.
It should be noted that two audio signals will be obtained by processor blind source separating audio vector, due to blind source point Uncertainty from output signal, the end-point detection algorithm that electronic equipment can be run by processor identify two isolated Voice signal and noise signal in audio signal.
In one embodiment, " voice signal is obtained by processor blind source separating audio vector ", comprising:
(1) multiple audio frames are obtained by processor framing audio vector;
(2) separation for being used for each audio frame of blind source separating is obtained by processor;
(3) it is based on each separation, sub- voice signal is obtained by the corresponding audio frame of processor blind source separating;
(4) aforementioned voice signal is obtained by the sub- voice signal that processor merges each audio frame.
In the embodiment of the present application, electronic equipment is obtaining noise signal and voice by processor blind source separating audio vector When signal, multiple audio frames are obtained by processor framing audio vector first, wherein the length for each audio frame that framing obtains It is identical.
For example, electronic equipment is when obtaining multiple audio frames by processor framing audio vector, according to 20 milliseconds of frame length Audio vector framing is obtained into multiple audio frames, is expressed asM indicates frame number.
After framing audio vector obtains multiple audio frames, for each audio frame, electronic equipment is obtained by processor respectively It takes in the separation of each audio frame of blind source separating, then xmFor separation can be expressed as wm
After getting the separation for each audio frame of blind source separating, electronic equipment is based further on each segregative line Number, obtains sub- noise signal and sub- voice signal by the corresponding audio frame of processor blind source separating, indicates are as follows:Wherein,Indicate m-th of audio frame, wmIndicate the corresponding separation of m-th of audio frame,Indicate the sub- voice signal isolated from m-th of audio frame,Indicate the sub- noise isolated from m-th of audio frame Signal.
After completing to the blind source separating of each audio frame, electronic equipment is successive suitable in timing according to each audio frame Sequence obtains aforementioned voice signal by the sub- voice signal that processor merges each audio frame, and merges each sound by processor The sub- noise signal of frequency frame obtains aforementioned noise signal.
It should be noted that two sub- audio signals will be obtained by any audio frame of processor blind source separating, due to blind Source separates the uncertainty of output signal, what the end-point detection algorithm identification that electronic equipment can be run by processor was isolated Sub- voice signal and sub- noise signal in two sub- audio signals.
In one embodiment, " separation for being used for each audio frame of blind source separating is obtained by processor ", comprising:
(1) pass through processor whitening processing current audio frame;
(2) the corresponding separation of previous audio frame is set as to the initially-separate coefficient of current audio frame, based at albefaction Current audio frame and initially-separate coefficient after reason go out to be used for the segregative line of blind source separating audio frame by processor iteration Number.
It should be noted that in the embodiment of the present application, for each audio frame that framing obtains, electronic equipment passes through frame by frame Processor obtains the separation for being used for each audio frame of blind source separating.Wherein, above-mentioned current audio frame not refers in particular to a certain audio Frame, but in generation, refers to the audio frame for currently obtaining corresponding separation, can be any audio frame.For example, electronic equipment The separation of first frame audio frame is currently being obtained, then the first frame audio frame is current audio frame.
Wherein, when obtaining the separation for being used for each audio frame of blind source separating by processor, electronic equipment can lead to Processor whitening processing current audio frame is crossed, so that the correlation between current audio frame difference component reduces.Assuming that current sound Frequency frame is m frame audio frame, then can be expressed as to current audio frame whitening processing
Wherein,M frame audio frame after indicating whitening processing, V indicate the corresponding covariance matrix of m frame audio frame, D-1/2Indicate the inverse On Square-Rooting Matrices of covariance matrix V, the transposition of T representing matrix, xmIndicate m frame audio frame.
After completing to the whitening processing of current audio frame, the corresponding separation of previous audio frame is set as by electronic equipment The initially-separate coefficient of current audio frame, and based on the current audio frame and initially-separate coefficient after whitening processing, pass through place Reason device iteration goes out to be used for the separation of blind source separating current audio frame.
Wherein, it during an iteration, can indicate are as follows:
Wherein, n represents nth iteration, and value is [1, N], and wherein N indicates total the number of iterations, can be by the common skill in this field Art personnel take empirical value according to actual needs, for example, N is set as 10, i.e. iteration 10 times in the embodiment of the present application;wm,n-1Indicate the (when n is 1, as initially-separate coefficient that is to say that previous audio frame is logical to separation after (n-1)th iteration of m frame audio frame Iteration n times are crossed, the separation restrained, for example, when m frame audio frame is the second frame audio frame, initially-separate system Number is the separation that the initially-separate coefficient of first frame audio frame is restrained after n times iteration), E representative is made even Mean value, g (u)=- exp (- au2It/2) is a gauss of distribution function, a takes empirical value,G ' (u) represents single order It differentiates, wM, nSeparation after indicating m frame audio frame nth iteration.
As described above, that is, convergence is obtained for blind source separating m after completing to the n times iteration of initially-separate coefficient The separation of frame audio frame.
It should be noted that when m frame audio frame is first frame audio frame, it, will be at the beginning of it since former frame is not present in it Beginning separation w1,0It is set as
In one embodiment, " noise reduction audio signal is obtained by two audio signals of processor noise reduction ", comprising:
(1) two respective current audio frames of audio signal are transformed from the time domain to by frequency domain by processor, and in frequency domain The sub-audio signal in two current audio frames from respective desired orientation is extracted, obtains two sub- audio signals, wherein two The corresponding desired orientation of current audio frame is opposite;
(2) frequency band division, and the multiple sub-band roots obtained in division are carried out to two sub- audio signals by processor Wave beam forming is done according to corresponding Wave beam forming filter coefficient, obtains multiple Wave beam forming signals;
(3) in multiple sub-bands respectively according to corresponding Wave beam forming filter coefficient and two sub- audio signals respectively Auto-correlation coefficient, by processor obtain be respectively used to multiple Wave beam forming signals carry out noise suppressed multiple gains because Son;
(4) noise suppressed is carried out to multiple Wave beam forming signals respectively according to multiple gain factors by processor, and will The present video that multiple Wave beam forming signals after noise suppressed carry out after band combination conversion to time domain, after obtaining noise suppressed Frame;
(5) noise reduction audio signal is obtained according to the current audio frame after noise suppressed by processor.
It should be noted that in the embodiment of the present application, two microphones are arranged back-to-back, and two microphones are set back-to-back The pickup hole for referring to the two microphones is set towards on the contrary.For example, referring to figure 2., electronic equipment includes two microphones, respectively For the microphone 1 of electronic equipment lower side is arranged in and the microphone 2 of side on an electronic device is arranged, wherein microphone 1 Downward, the pickup hole of microphone 2 is upward in pickup hole.In addition, two microphones set by electronic equipment can be non-directive Microphone (in other words, omni-directional microphone).
It should be noted that electronic equipment two audio signals that identical duration is collected by two microphones it Afterwards, sub-frame processing is carried out to two audio signals by processor respectively, two audio signals are divided into the more of same number A audio frame, to carry out noise suppressed frame by frame.
For example, referring to figure 3., collect two audio signals are denoted as audio signal 1 and audio signal 2 respectively, 1 framing of audio signal can be the n audio frame that length is 20 milliseconds by electronic equipment, be equally length by 2 framing of audio signal The n audio frame that degree is 20 milliseconds, thus according to first audio frame from audio signal 1 and from audio signal 2 First audio frame carries out noise suppressed, the audio frame after obtaining first noise suppressed, according to second from audio signal 1 A audio frame and second audio frame from audio signal 2 carry out noise suppressed, the sound after obtaining second noise suppressed Frequency frame, and noise is carried out according to n-th of audio frame from audio signal 1 and n-th of audio frame from audio signal 2 Inhibit, the audio frame, etc. after obtaining n-th of noise suppressed.In this way, can be obtained according to the audio frame after these noise suppresseds Complete audio signal after one noise suppressed, i.e. noise reduction audio signal.
It should be noted that current audio frame is not used to refer in particular to a certain audio frame, but refer to that current time is used for generation In the audio frame for carrying out noise suppressed, for example, if being made an uproar at current time according to the 5th audio frame of two audio signals Sound inhibits, then the 5th of two audio signals audio frame is current audio frame, if being believed at current time according to two audios Number the 6th audio frame carry out noise suppressed, then the 6th of two audio signals audio frame is current audio frame, etc..
In the embodiment of the present application, electronic equipment by processor by the respective current audio frame of both of the aforesaid audio signal from Time domain transforms to frequency domain, and comes from respective desired orientation (desired orientation of microphone) in two current audio frames of frequency domain extraction Sub-audio signal, obtain two sub- audio signals.Wherein, the desired orientation of two microphones is opposite, wherein distance objective sound The desired orientation of the closer microphone in source is and the expectation of the farther away microphone of distance objective sound source towards the direction of target sound source Direction is the direction far from target sound source.
For example, electronic equipment carries out sound collection when owner converses, then owner is target sound source, by the two of electronic equipment A microphone is denoted as microphone 1 and microphone 2, if microphone 1 is closer apart from owner, the desired orientation of microphone 1 is direction The direction of owner, the desired orientation of microphone 2 are the direction far from owner.
As described above, those of ordinary skill in the art it is possible that, for electronic equipment from two current audio frames In two sub- audio signals extracting, one of sub-audio signal carries more " target sound ", and another consonant Frequency signal carries more " noise ".
Electronic equipment is drawn after extracting two sub- audio signals in two current audio frames according to identical frequency band It is divided to mode to carry out frequency band division to two sub- audio signals, obtains multiple sub-bands.Later, for each sub-band, according to this The corresponding Wave beam forming filter coefficient of sub-band does Wave beam forming, obtains the Wave beam forming signal of the sub-band, in this way, for Division obtains multiple sub-bands, and correspondence is obtained multiple Wave beam forming signals by electronic equipment.
For example, electronic equipment carries out frequency band division to two sub- audio signals according to identical frequency band division mode, i is obtained A sub-band, and Wave beam forming is done according to corresponding Wave beam forming filter coefficient in i sub-band respectively, obtain i wave beam Form signal.
Electronic equipment is after obtaining multiple Wave beam forming signals, by processor in each sub-band respectively to two consonants Frequency signal does autocorrelation calculation, obtains the auto-correlation coefficient of two sub- each each sub-bands of leisure of audio signal.Later, for each Sub-band, according to the corresponding Wave beam forming filter coefficient of the sub-band and each comfortable sub-band of two sub- audio signals Auto-correlation coefficient, obtain for the sub-band Wave beam forming signal carry out noise suppressed gain factor.In this way, for wave Correspondence is obtained being respectively used to make an uproar to this multiple Wave beam forming signal by multiple Wave beam forming signals that beam is formed, electronic equipment The gain factor that sound inhibits.
Electronic equipment acquire be respectively used to multiple Wave beam forming signals carry out noise suppressed multiple gains because After son, noise suppressed can be carried out to multiple Wave beam forming signals respectively according to this multiple gain factor by processor, obtained Multiple Wave beam forming signals after to noise suppressed.Later, electronic equipment passes through processor for multiple wave beams after noise suppressed It forms signal and carries out current audio frame of the conversion to time domain, after obtaining noise suppressed after band combination.
So far, for coming from two each audio frames of audio signal, the equal noise reduction of electronic equipment has obtained corresponding audio frame, electricity Sub- equipment further splices each audio frame that noise reduction obtains and obtains aforementioned noise reduction audio signal.
In one embodiment, " verification noise reduction audio signal ", comprising:
(1) end-point detection is carried out to noise reduction audio signal by processor, and according to end-point detection result by noise reduction audio Signal is divided into multiple sub- noise reduction audio signals;
(2) vocal print feature relevant to pre-set text is called to extract each sub- noise reduction audio signal of model extraction by processor Vocal print feature vector;
(3) it is obtained between the vocal print feature vector of each sub- noise reduction audio signal and target vocal print feature vector by processor Similarity, target vocal print feature vector be pre-set user say pre-set text audio signal vocal print feature vector;
(4) according to the corresponding similarity of each sub- noise reduction audio signal, pass through the text that processor verifies noise reduction audio signal Feature and vocal print feature.
In the embodiment of the present application, it is contemplated that noise reduction audio signal is usually continuous voice, is needed to noise reduction audio signal It is split.Wherein, processor carries out end-point detection to noise reduction audio signal using default end-point detection algorithm first, then root Noise reduction audio signal is divided into multiple sub-audio signals according to end-point detection result, is denoted as sub- noise reduction audio signal.It should illustrate , for the end-point detection algorithm that processor uses, it is not particularly limited in the embodiment of the present application, it can be by the common skill in this field Art personnel choose according to actual needs, for example, processor uses VAD (Voice Activity in the embodiment of the present application Detection, speech terminals detection) algorithm to noise reduction audio signal carry out end-point detection.In addition, according to end-point detection result When noise reduction audio signal is divided multiple sub- noise reduction audio signals, before time interval is less than by processor according to end-point detection result It states audio data corresponding to the adjacent endpoint of preset duration (for example being set as 200 milliseconds) and is divided into a sub- noise reduction audio letter Number.
It should be noted that in the embodiment of the present application also training in advance have it is related with pre-set text (such as default wake-up word) Vocal print feature extract model.For example, train the vocal print feature based on convolutional neural networks to extract model in the embodiment of the present application, Referring to figure 4., more people (such as 200 people) can be acquired in advance and say the default audio signal for waking up word, then to these audios Signal carries out end-point detection, is partitioned into default wake-up word part therein, then pre-processes to the audio signal being partitioned into And adding window, then carry out Fourier transformation (such as Short Time Fourier Transform) and it is calculated for the audio signal after Fourier transformation Energy density generates the sound spectrograph of gray scale (as shown in figure 5, wherein horizontal axis indicates time, longitudinal axis expression frequency, gray value expression Energy value), finally, being trained using sound spectrograph of the convolutional neural networks to generation, generate vocal print relevant to pre-set text Feature Selection Model.In addition, also extracting pre-set user in the embodiment of the present application says the default sound for waking up word (i.e. pre-set text) The sound spectrograph of frequency signal, and the vocal print feature trained before being input to is extracted in model, extracts the more of model by vocal print feature After a convolutional layer, pond layer and full articulamentum, corresponding one group of feature vector will be exported, be denoted as target vocal print feature to Amount.
Correspondingly, processor after noise reduction audio signal to be divided into multiple sub- noise reduction audio signals, extracts more respectively The sound spectrograph of a sub- noise reduction audio signal.Wherein, for how to extract sound spectrograph, details are not described herein again, specifically can refer to above Associated description.After the sound spectrograph for extracting aforesaid plurality of sub- noise reduction audio signal, processor is respectively by aforesaid plurality of sub- drop The vocal print feature of training extracts model before the sound spectrograph for audio signal of making an uproar is input to, to obtain each sub- noise reduction audio signal Vocal print feature vector.
After extraction obtains the vocal print feature vector of each sub- noise reduction audio signal, processor obtains each sub- noise-reducing respectively Similarity between the vocal print feature vector and target vocal print feature vector of frequency signal, then, according to each sub- noise reduction audio signal Corresponding similarity verifies the text feature and vocal print feature of noise reduction audio signal.For example, processor can decide whether There are the similarity between vocal print feature vector and target vocal print feature vector reach default similarity (can be by the common skill in this field Art personnel take empirical value according to actual needs, for example can be set to sub- noise reduction audio signal 75%), and if it exists, then determine Text feature and the vocal print feature verification of noise reduction audio signal pass through.
In one embodiment, " according to the corresponding similarity of each sub- noise reduction audio signal, noise reduction audio is verified by processor The text feature and vocal print feature of signal ", comprising:
According to each sub- corresponding similarity of noise reduction audio signal and preset recognition function, noise reduction is verified by processor The text feature and vocal print feature of audio signal;
Wherein, recognition function γnn-1+f(ln), γnIndicate the corresponding identification letter of n-th of sub- noise reduction audio signal Number state value, γn-1The corresponding recognition function state value of the sub- noise reduction audio signal of expression (n-1)th, A is the correction value of recognition function, and b is default similarity, lnFor the vocal print feature vector and target of n-th of sub- noise reduction audio signal Similarity between vocal print feature vector;
There is the γ for being greater than default recognition function state value in processornWhen, determine the text feature of noise reduction audio signal And vocal print feature verification passes through.
It should be noted that the value of a can be learnt from else's experience according to actual needs by those of ordinary skill in the art in recognition function Value is tested, for example, can be 1 by a value.
In addition, the discrimination that the value Yu vocal print feature of b extract model in recognition function is positively correlated, obtained according to hands-on To vocal print feature extract the discrimination of model and determine the value of b.
In addition, default recognition function state value can also take according to actual needs empirical value by those of ordinary skill in the art, Its value is bigger, also just also big to the accuracy of the first audio signal verification.
As a result, by the recognition function, even if when the first audio signal includes the default other information (ratio waken up except word Such as, presetting and waking up word is " little Ou little Ou ", and the corresponding text of the first audio signal is that " how is little Ou little Ou weather today Sample "), also accurately it can be identified.
In one embodiment, " the vocal print feature vector and target vocal print of each sub- noise reduction audio signal are obtained by processor Similarity between feature vector ", comprising:
The vocal print feature vector and mesh of each sub- noise reduction audio signal are calculated according to dynamic time warping algorithm by processor Mark the similarity between vocal print feature vector;
Alternatively, by processor calculate each sub- noise reduction audio signal vocal print feature vector and target vocal print feature vector it Between characteristic distance as similarity.
In the embodiment of the present application, in the vocal print feature vector and the training of target vocal print feature for obtaining each sub- noise reduction audio signal Between similarity when, the vocal print that can calculate each sub- noise reduction audio signal according to dynamic time warping algorithm by processor is special Levy the similarity between vector and target vocal print feature vector.
Alternatively, can handle device calculate each sub- noise reduction audio signal vocal print feature vector and target vocal print feature vector it Between characteristic distance as similarity, wherein for which kind of characteristic distance to measure the similarity between two vectors using, this It is not particularly limited in application embodiment, for example, the vocal print of sub- noise reduction audio signal can be measured using Euclidean distance Similarity between feature vector and target vocal print feature vector.
Fig. 6 is another flow diagram of audio method of calibration provided by the embodiments of the present application.The audio method of calibration is answered For electronic equipment provided by the present application, which includes processor, dedicated voice identification chip and two microphones, such as Shown in Fig. 6, the process of audio method of calibration provided by the embodiments of the present application be can be such that
In 201, electronic equipment when the processor is in the dormant state, is believed by the audio outside the acquisition of any microphone Number, and audio signal is supplied to dedicated voice chip.
It should be noted that the dedicated voice identification chip in the embodiment of the present application is designed for the purpose of speech recognition Special chip, such as the purpose of voice and design digital signal processing chip, for the purpose of voice and design it is dedicated IC chip etc. has lower power consumption compared to general processor.Wherein, dedicated voice identification chip, processing Device and audio collection unit arbitrarily establish communication connection by communication bus (such as I2C bus) therebetween, realize data Interaction.
In the embodiment of the present application, processor suspend mode when the screen of electronic equipment is in and puts out screen state, and dedicated voice is known The suspend mode when screen is in bright screen state of other chip.In addition, two microphones included by electronic equipment can be built-in wheat Gram wind, is also possible to external microphone (can be wired microphone, be also possible to wireless microphone).
Wherein, when the processor is in the dormant state (dedicated voice identification chip is in wake-up states), electronic equipment is logical It crosses any microphone to be acquired external sound, it is assumed that microphone is simulation microphone, then the sound that will collect simulation Frequency signal needs the audio signal that will be simulated to carry out analog-to-digital conversion at this time, obtains digitized audio signal, is used for subsequent place Reason.For example, electronic equipment can be after collecting external analog audio signal by microphone, with the sample frequency of 16KHz The analog audio signal is sampled, digital audio and video signals are obtained.
One of ordinary skill in the art will appreciate that if microphone included by electronic equipment is digital microphone, Then digitized audio signal will be directly collected, no longer needs to carry out analog-to-digital conversion.
After collecting external audio signal, collected audio signal is supplied to dedicated voice by electronic equipment to be known Other chip.
In 202, electronic equipment verifies audio signal by dedicated voice chip, and wake-up is handled when verification passes through Device, and the suspend mode of dedicated voice chip is controlled after wake-up processor.
In the embodiment of the present application, after it will collect external audio signal and be supplied to dedicated voice chip, electronics is set It is standby that the audio signal is further verified by the first checking algorithm run on dedicated voice chip, obtain check results.Wherein, Including but not limited to verify the text feature and/or vocal print feature of aforementioned audio signal.
Popular says, the text feature for verifying audio signal that is to say in verification audio signal whether include default wake up Word, as long as audio signal includes default wake-up word, i.e. the text feature of verification audio signal passes through, regardless of this, which is preset, wakes up word Whom said by.For example, audio signal has included pre-set user (for example, the owner or owner of electronic equipment license electronics The other users of equipment) setting default wake-up word, but the default word that wakes up is said by user A, rather than pre-set user, specially With voice recognition chip when based on the text feature of the first checking algorithm verification audio signal, verification is passed through.
And it verifies the text feature of audio signal and vocal print feature and whether that is to say in verification audio signal including default The default wake-up word that user says, if in audio signal including the default wake-up word that pre-set user is said, the text of audio signal Eigen and vocal print feature verification pass through, and otherwise verify and do not pass through.For example, audio signal has included the pre- of pre-set user setting If waking up word, and the default word that wakes up is said by pre-set user, then the text feature of the audio signal and vocal print feature verification Pass through;For another example, audio signal has included the default wake-up word or audio signal that the other users except pre-set user are said When not including the default wake-up word that any user says, then the text feature of the audio signal and vocal print feature fail verification (do not verify pass through in other words).
In the embodiment of the present application, electronic equipment by dedicated voice identification chip verify aforementioned audio signal pass through when, Preset interrupt signal is sent to processor, to wake up place by the communication connection between dedicated voice identification chip and processor Manage device.After wake-up processor, electronic equipment suspend mode dedicated voice identification chip, meanwhile, screen is switched to by chip status Bright screen state.
It should be noted that passing through if aforementioned audio signal does not verify, electronic equipment continues will be by any microphone The audio signal of the outside got is supplied to dedicated voice identification chip and is verified, until verification passes through.
In 203, electronic equipment by two microphones obtain outside two audio signals, and by two audio signals It is supplied to processor.
Electronic equipment is after wake-up processor, when identical by two outside the synchronous acquisition of two microphones of setting Long audio signal, and two audio signals that will acquire are supplied to processor.
According to related description provided above, those of ordinary skill in the art are it should be understood that provided herein to the two of processor A audio signal is similarly digitized audio signal.
In 204, electronic equipment characterizes two audio signals by processor vectorization, obtains audio vector.
In the embodiment of the present application, electronic equipment can processor in a manner of blind source separating two audio signals of noise reduction Obtain noise reduction audio signal.
Wherein, electronic equipment passes through processor vectorization two audio signals of characterization first, obtains audio vector.For example, Assuming that two audio signals got are respectively x1And x2, then two audio signals are characterized by processor vectorization and obtain sound Frequency vector
In 205, electronic equipment obtains multiple audio frames by processor framing audio vector, and is obtained by processor Separation for each audio frame of blind source separating.
Electronic equipment obtains multiple audio frames by processor framing audio vector, wherein each audio frame that framing obtains Length it is identical.
For example, electronic equipment is when obtaining multiple audio frames by processor framing audio vector, according to 20 milliseconds of frame length Audio vector framing is obtained into multiple audio frames, is expressed asM indicates frame number.
In 206, electronic equipment is based on each separation, obtains sub- language by the corresponding audio frame of processor blind source separating Sound signal.
After framing audio vector obtains multiple audio frames, for each audio frame, electronic equipment is obtained by processor respectively It takes in the separation of each audio frame of blind source separating, then xmFor separation can be expressed as wm
After getting the separation for each audio frame of blind source separating, electronic equipment is based further on each segregative line Number, obtains sub- noise signal and sub- voice signal by the corresponding audio frame of processor blind source separating, indicates are as follows:Wherein,Indicate m-th of audio frame, wmIndicate the corresponding separation of m-th of audio frame,Indicate the sub- voice signal isolated from m-th of audio frame,Indicate the sub- noise isolated from m-th of audio frame Signal.
It should be noted that two sub- audio signals will be obtained by any audio frame of processor blind source separating, due to blind Source separates the uncertainty of output signal, what the end-point detection algorithm identification that electronic equipment can be run by processor was isolated Sub- voice signal and sub- noise signal in two sub- audio signals.
In 207, electronic equipment obtains noise reduction audio signal by the sub- voice signal that processor merges each audio frame.
After completing to the blind source separating of each audio frame, electronic equipment is successive suitable in timing according to each audio frame Sequence obtains noise reduction audio signal by the sub- voice signal that processor merges each audio frame, and merges each sound by processor The sub- noise signal of frequency frame obtains aforementioned noise signal.
In 208, electronic equipment verifies noise reduction audio signal by processor, obtains check results.
Electronic equipment further passes through processing after obtaining noise reduction audio signal by two audio signals of processor noise reduction Second checking algorithm of device operation verifies the noise reduction audio signal, obtains check results, wherein including but not limited to verify aforementioned The text feature and/or vocal print feature of noise reduction audio signal.When verification noise reduction audio signal passes through, electronic equipment can be into one Step executes the operation of the corresponding noise reduction audio signal, including but not limited to solution lock screen, starting voice assistant etc..
It should be noted that the first checking algorithm of dedicated voice identification chip operation and the second verification of processor operation Algorithm may be the same or different, and the embodiment of the present application is not particularly limited this.
Fig. 7 is please referred to, Fig. 7 is the structural schematic diagram of audio calibration equipment provided by the embodiments of the present application.Audio verification Device can be applied to electronic equipment, which includes processor, dedicated voice identification chip and two microphones.Audio Calibration equipment may include the first acquisition module 401, the first correction verification module 402, the second acquisition module 403 and the second calibration mode Block 404, wherein
First acquisition module 401, for when the processor is in the dormant state, passing through the sound outside the acquisition of any microphone Frequency signal, and audio signal is supplied to dedicated voice chip;
First correction verification module 402, for verifying audio signal by dedicated voice chip, and when verification passes through at wake-up Device is managed, and controls the suspend mode of dedicated voice chip after wake-up processor;
Second acquisition module 403, for two audio signals outside being obtained by two microphones, and by two audios Signal is supplied to processor;
Second correction verification module 404 for obtaining noise reduction audio signal by two audio signals of processor noise reduction, and verifies Noise reduction audio signal obtains check results.
In one embodiment, when obtaining noise reduction audio signal by two audio signals of processor noise reduction, the second verification Module 404 can be used for:
Two audio signals are characterized by processor vectorization, obtain audio vector;
Voice signal is obtained by processor blind source separating audio vector, voice signal is set as noise reduction audio signal.
In one embodiment, when obtaining voice signal by processor blind source separating audio vector, the second correction verification module 404 can be used for:
Multiple audio frames are obtained by processor framing audio vector;
The separation for being used for each audio frame of blind source separating is obtained by processor;
Based on each separation, sub- voice signal is obtained by the corresponding audio frame of processor blind source separating;
Aforementioned voice signal is obtained by the sub- voice signal that processor merges each audio frame.
In one embodiment, when obtaining the separation for being used for each audio frame of blind source separating by processor, the second school Testing module 404 can be used for:
Pass through processor whitening processing current audio frame;
The corresponding separation of previous audio frame is set as to the initially-separate coefficient of current audio frame, based on after whitening processing Current audio frame and initially-separate coefficient, gone out by processor iteration and be used for the separation of blind source separating audio frame.
In one embodiment, when obtaining noise reduction audio signal by two audio signals of processor noise reduction, the second verification Module 404 can be used for:
Two respective current audio frames of audio signal are transformed from the time domain into frequency domain by processor, and in frequency domain extraction From the sub-audio signal of respective desired orientation in two current audio frames, two sub- audio signals are obtained, wherein two current The corresponding desired orientation of audio frame is opposite;
Frequency band division is carried out to two sub- audio signals by processor, and is dividing obtained multiple sub-bands according to right The Wave beam forming filter coefficient answered does Wave beam forming, obtains multiple Wave beam forming signals;
It is respective according to corresponding Wave beam forming filter coefficient and two sub- audio signals respectively in multiple sub-bands Auto-correlation coefficient, by processor obtain be respectively used to multiple Wave beam forming signals carry out noise suppressed multiple gains because Son;
Noise suppressed is carried out to multiple Wave beam forming signals respectively according to multiple gain factors by processor, and by noise The current audio frame that multiple Wave beam forming signals after inhibition carry out after band combination conversion to time domain, after obtaining noise suppressed;
Noise reduction audio signal is obtained according to the current audio frame after noise suppressed by processor.
In one embodiment, when verifying noise reduction audio signal, the second correction verification module 404 can be used for:
End-point detection is carried out to noise reduction audio signal by processor, and according to end-point detection result by noise reduction audio signal It is divided into multiple sub- noise reduction audio signals;
Vocal print feature relevant to pre-set text is called to extract each sub- noise reduction audio signal of model extraction by processor Vocal print feature vector;
It is obtained between the vocal print feature vector of each sub- noise reduction audio signal and target vocal print feature vector by processor Similarity, target vocal print feature vector are the vocal print feature vector for the audio signal that pre-set user says pre-set text;
According to the corresponding similarity of each sub- noise reduction audio signal, pass through the text feature that processor verifies noise reduction audio signal And vocal print feature.
In one embodiment, according to the corresponding similarity of each sub- noise reduction audio signal, noise-reducing is verified by processor When the text feature and vocal print feature of frequency signal, the second correction verification module 404 can be used for:
According to each sub- corresponding similarity of noise reduction audio signal and preset recognition function, noise reduction is verified by processor The text feature and vocal print feature of audio signal;
Wherein, recognition function γnn-1+f(ln), γnIndicate the corresponding identification letter of n-th of sub- noise reduction audio signal Number state value, γn-1The corresponding recognition function state value of the sub- noise reduction audio signal of expression (n-1)th, A is the correction value of recognition function, and b is default similarity, lnFor the vocal print feature vector and target of n-th of sub- noise reduction audio signal Similarity between vocal print feature vector;
There is the γ for being greater than default recognition function state value in processornWhen, determine the text feature of noise reduction audio signal And vocal print feature verification passes through.
In one embodiment, in the vocal print feature vector and target vocal print for obtaining each sub- noise reduction audio signal by processor When similarity between feature vector, the second correction verification module 404 can be used for:
The vocal print feature vector and mesh of each sub- noise reduction audio signal are calculated according to dynamic time warping algorithm by processor Mark the similarity between vocal print feature vector;
Alternatively, by processor calculate each sub- noise reduction audio signal vocal print feature vector and target vocal print feature vector it Between characteristic distance as similarity.
The embodiment of the present application provides a kind of storage medium, is stored thereon with audio checking routine, when the audio school of its storage Program is tested when executing on electronic equipment provided by the embodiments of the present application, so that electronic equipment executes the embodiment of the present application such as and provides Audio method of calibration in step.Wherein, storage medium can be magnetic disk, CD, read-only memory (Read Only Memory, ROM) or random access device (Random Access Memory, RAM) etc..
The embodiment of the present application also provides a kind of electronic equipment, please refers to Fig. 8, electronic equipment include audio collection unit 101, Processor 102,103, two microphones 104 of dedicated voice identification chip and memory 105, and dedicated voice identification chip 103 Power consumption be less than processor 102 power consumption, wherein dedicated voice identification chip 103, processor 102 and audio collection unit 101 any established by communication bus (such as I2C bus) therebetween communicate to connect, and realize the interaction of data.
It should be noted that the dedicated voice identification chip 103 in the embodiment of the present application be for the purpose of speech recognition and The special chip of design, such as the digital signal processing chip designed for the purpose of voice, are designed for the purpose of voice Dedicated IC chip etc. has lower power consumption compared to general processor.
Processor in the embodiment of the present application is general processor, such as the processor of ARM framework.
It is stored with audio checking routine in memory 105, can be high-speed random access memory, can also be non-easy The property lost memory, such as at least one disk memory, flush memory device or other volatile solid-state parts etc..Accordingly Ground, memory 105 can also include Memory Controller, to provide processor 102 and 103 pairs of dedicated voice identification chip storages The access of device 105.
In the embodiment of the present application, audio collection unit 101 be used for processor 102 in a dormant state when, by appoint One microphone 104 obtains external audio signal, and audio signal is supplied to dedicated voice chip;
Dedicated voice identification chip 103 is used to verify audio signal, and wake-up processor 102 when verification passes through, and The suspend mode after wake-up processor 102;
Audio collection unit is used for after wake-up processor 102, and two external audios are obtained by two microphones 104 Signal, and two audio signals are supplied to processor 102;
Processor 102 obtains noise reduction audio signal for two audio signals of noise reduction, and verifies noise reduction audio signal, obtains Check results.
In one embodiment, when two audio signals of noise reduction obtain noise reduction audio signal, processor 102 can be used for:
Vectorization characterizes both of the aforesaid audio signal, obtains audio vector;
Blind source separating audio vector obtains voice signal, and voice signal is set as noise reduction audio signal.
In one embodiment, when blind source separating audio vector obtains voice signal, processor 102 can be used for:
Framing aforementioned audio vector obtains multiple audio frames;
Obtain the separation for being used for each audio frame of blind source separating;
Sub- voice signal is obtained based on the corresponding audio frame of each separation blind source separating;
The sub- voice signal for merging each audio frame obtains aforementioned voice signal.
In one embodiment, when obtaining the separation for being used for each audio frame of blind source separating, processor 102 can be used In:
Whitening processing current audio frame;
The corresponding separation of previous audio frame is set as to the initially-separate coefficient of current audio frame, based on after whitening processing Current audio frame and initially-separate coefficient, iteration, which goes out, is used for the separation of blind source separating current audio frame.
In one embodiment, when two audio signals of noise reduction obtain noise reduction audio signal, processor 102 can be used for:
Two respective current audio frames of audio signal are transformed from the time domain into frequency domain by processor 102, and in frequency domain The sub-audio signal in two current audio frames from respective desired orientation is extracted, obtains two sub- audio signals, wherein two The corresponding desired orientation of current audio frame is opposite;
Frequency band division is carried out to two sub- audio signals, and is dividing obtained multiple sub-bands according to corresponding wave beam shape Wave beam forming is done at filter coefficient, obtains multiple Wave beam forming signals;
It is respective according to corresponding Wave beam forming filter coefficient and two sub- audio signals respectively in multiple sub-bands Auto-correlation coefficient obtains the multiple gain factors for being respectively used to that multiple Wave beam forming signals are carried out with noise suppressed;
Noise suppressed is carried out to multiple Wave beam forming signals respectively according to multiple gain factors, and will be more after noise suppressed The current audio frame that a Wave beam forming signal carries out after band combination conversion to time domain, after obtaining noise suppressed;
Noise reduction audio signal is obtained according to the current audio frame after noise suppressed.
In one embodiment, when verifying noise reduction audio signal, processor 102 can be used for:
End-point detection is carried out to aforementioned noise reduction audio signal, and is drawn aforementioned noise reduction audio signal according to end-point detection result It is divided into multiple sub- noise reduction audio signals;
Call the vocal print feature of relevant to pre-set text vocal print feature extraction each sub- noise reduction audio signal of model extraction to Amount;
Obtain the similarity between the vocal print feature vector of each sub- noise reduction audio signal and target vocal print feature vector, target Vocal print feature vector is the vocal print feature vector for the audio signal that pre-set user says pre-set text;
According to the corresponding similarity of each sub- noise reduction audio signal, the text feature and sound of aforementioned noise reduction audio signal are verified Line feature.
In one embodiment, according to the corresponding similarity of each sub- noise reduction audio signal, aforementioned noise reduction audio signal is verified When text feature and vocal print feature, processor 102 can be used for:
According to each sub- corresponding similarity of noise reduction audio signal and preset recognition function, aforementioned noise reduction audio letter is verified Number text feature and vocal print feature;
Wherein, recognition function γnn-1+f(ln), γnIndicate the corresponding identification letter of n-th of sub- noise reduction audio signal Number state value, γn-1The corresponding recognition function state value of the sub- noise reduction audio signal of expression (n-1)th, A is the correction value of recognition function, and b is default similarity, lnFor the vocal print feature vector and target of n-th of sub- noise reduction audio signal Similarity between vocal print feature vector;
There is the γ for being greater than default recognition function state value in processor 102nWhen, determine the text of aforementioned noise reduction audio signal Eigen and vocal print feature verification pass through.
In one embodiment, the vocal print feature vector and target vocal print feature vector for obtaining each sub- noise reduction audio signal it Between similarity when, processor 102 can be used for:
The vocal print feature vector and target vocal print feature of each sub- noise reduction audio signal are calculated according to dynamic time warping algorithm Similarity between vector;
Alternatively, calculate feature between the vocal print feature vector and target vocal print feature vector of each sub- noise reduction audio signal away from From as similarity.
It should be noted that the audio method of calibration category in electronic equipment provided by the embodiments of the present application and foregoing embodiments In same design, either offer method in audio method of calibration embodiment, specific implementation can be run on an electronic device Process is detailed in feature extracting method embodiment, and details are not described herein again.
It should be noted that for the audio method of calibration of the embodiment of the present application, this field common test personnel can be with Understand all or part of the process for realizing the audio method of calibration of the embodiment of the present application, is that can be controlled by computer program Relevant hardware is completed, and the computer program can be stored in a computer-readable storage medium, be such as stored in electronics In the memory of equipment, and by the electronic equipment processor and dedicated voice identification chip execute, in the process of implementation may be used Process including such as embodiment of audio method of calibration.Wherein, the storage medium can for magnetic disk, CD, read-only memory, Random access memory etc..
A kind of audio method of calibration, storage medium and electronic equipment provided by the embodiment of the present application have been carried out in detail above Thin to introduce, specific examples are used herein to illustrate the principle and implementation manner of the present application, and above embodiments are said It is bright to be merely used to help understand the present processes and its core concept;Meanwhile for those skilled in the art, according to this Shen Thought please, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is the limitation to the application.

Claims (10)

1. a kind of audio method of calibration, it is applied to electronic equipment, which is characterized in that the electronic equipment includes processor, dedicated Voice recognition chip and two microphones, and the power consumption of the dedicated voice identification chip is less than the power consumption of the processor, institute Stating audio method of calibration includes:
The processor in a dormant state when, by any microphone obtain outside audio signal, and by the audio Signal is supplied to the dedicated voice chip;
The audio signal is verified by the dedicated voice chip, and wakes up the processor, Yi Ji when verification passes through The dedicated voice chip suspend mode is controlled after waking up the processor;
By two audio signals outside two microphones acquisitions, and described two audio signals are supplied to the processing Device;
Noise reduction audio signal is obtained by the described two audio signals of processor noise reduction, and verifies the noise reduction audio letter Number, obtain check results.
2. audio method of calibration according to claim 1, which is characterized in that described by two described in the processor noise reduction A audio signal obtains noise reduction audio signal, comprising:
Described two audio signals are characterized by the processor vectorization, obtain audio vector;
Voice signal is obtained by audio vector described in the processor blind source separating, the voice signal is set as the noise reduction Audio signal.
3. audio method of calibration according to claim 2, which is characterized in that described to pass through the processor blind source separating institute It states audio vector and obtains voice signal, comprising:
Multiple audio frames are obtained by audio vector described in the processor framing;
The separation for being used for each audio frame of blind source separating is obtained by the processor;
Based on each separation, sub- voice signal is obtained by the corresponding audio frame of the processor blind source separating;
The voice signal is obtained by the sub- voice signal that the processor merges each audio frame.
4. audio method of calibration according to claim 3, which is characterized in that described to be obtained by the processor for blind Source separates the separation of each audio frame, comprising:
Pass through the processor whitening processing current audio frame;
The corresponding separation of previous audio frame is set as to the initially-separate coefficient of current audio frame, based on working as after whitening processing Preceding audio frame and the initially-separate coefficient are gone out by the processor iteration and are used for current audio frame described in blind source separating Separation.
5. audio method of calibration according to claim 1-4, which is characterized in that the verification noise reduction audio Signal, comprising:
End-point detection is carried out to the noise reduction audio signal by the processor, and according to end-point detection result by the noise reduction Audio signal is divided into multiple sub- noise reduction audio signals;
Vocal print feature relevant to pre-set text is called to extract each sub- noise reduction audio letter of model extraction by the processor Number vocal print feature vector;
By the processor obtain each sub- noise reduction audio signal vocal print feature vector and target vocal print feature vector it Between similarity, the target vocal print feature vector be pre-set user say the pre-set text audio signal vocal print feature Vector;
According to the corresponding similarity of each sub- noise reduction audio signal, the noise reduction audio signal is verified by the processor Text feature and vocal print feature.
6. audio method of calibration according to claim 5, which is characterized in that the processor is according to each sub- noise-reducing The corresponding similarity of frequency signal verifies the text feature and vocal print feature of the noise reduction audio signal, comprising:
According to each sub- corresponding similarity of noise reduction audio signal and preset recognition function, verified by the processor The text feature and vocal print feature of the noise reduction audio signal;
Wherein, the recognition function is γnn-1+f(ln), γnIndicate the corresponding identification letter of n-th of sub- noise reduction audio signal Number state value, γn-1The corresponding recognition function state value of the sub- noise reduction audio signal of expression (n-1)th, A is the correction value of the recognition function, and b is default similarity, lnFor n-th of sub- noise reduction audio signal vocal print feature vector with Similarity between the target vocal print feature vector;
There is the γ for being greater than default recognition function state value in the processornWhen, determine that the text of the noise reduction audio signal is special Sign and vocal print feature verification pass through.
7. audio method of calibration according to claim 5, which is characterized in that described each described by processor acquisition Similarity between the vocal print feature vector and target vocal print feature vector of sub- noise reduction audio signal, comprising:
The vocal print feature vector of each sub- noise reduction audio signal is calculated according to dynamic time warping algorithm by the processor With the similarity between target vocal print feature vector;
Alternatively, by the processor calculate each sub- noise reduction audio signal vocal print feature vector and target vocal print feature to Characteristic distance between amount is as similarity.
8. a kind of audio calibration equipment, it is applied to electronic equipment, which is characterized in that the electronic equipment includes processor, dedicated Voice recognition chip and two microphones, the audio calibration equipment include:
First acquisition module, for the processor in a dormant state when, pass through any microphone obtain outside audio Signal, and the audio signal is supplied to the dedicated voice chip;
First correction verification module, for verifying the audio signal, and the wake-up when verification passes through by the dedicated voice chip The processor, and the dedicated voice chip suspend mode is controlled after waking up the processor;
Second acquisition module for two audio signals outside being obtained by two microphones, and described two audios is believed Number it is supplied to the processor;
Second correction verification module, for obtaining noise reduction audio signal, and school by the described two audio signals of processor noise reduction The noise reduction audio signal is tested, check results are obtained.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes audio collection unit, processor, dedicated voice knowledge Other chip and two microphones, and the power consumption of the dedicated voice identification chip is less than the power consumption of the processor, wherein
The audio collection unit be used for the processor in a dormant state when, pass through any microphone obtain outside sound Frequency signal, and the audio signal is supplied to the dedicated voice chip;
The dedicated voice identification chip wakes up the processor when verification passes through for verifying the audio signal, with And the suspend mode after waking up the processor;
The audio collection unit is used for after waking up the processor, is believed by two audios outside two microphones acquisitions Number, and described two audio signals are supplied to the processor;
The processor obtains noise reduction audio signal for the described two audio signals of noise reduction, and verifies the noise reduction audio letter Number, obtain check results.
10. a kind of storage medium, which is characterized in that when the computer program stored in the storage medium include processor, When the operation of the electronic equipment of dedicated voice identification chip and two microphones, so that the electronic equipment executes such as claim 1 Step into 7 described in any item audio methods of calibration.
CN201910273077.9A 2019-04-04 2019-04-04 Audio verification method and device, storage medium and electronic equipment Active CN110021307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910273077.9A CN110021307B (en) 2019-04-04 2019-04-04 Audio verification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910273077.9A CN110021307B (en) 2019-04-04 2019-04-04 Audio verification method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110021307A true CN110021307A (en) 2019-07-16
CN110021307B CN110021307B (en) 2022-02-01

Family

ID=67190711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910273077.9A Active CN110021307B (en) 2019-04-04 2019-04-04 Audio verification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110021307B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473554A (en) * 2019-08-08 2019-11-19 Oppo广东移动通信有限公司 Audio method of calibration, device, storage medium and electronic equipment
CN110544468A (en) * 2019-08-23 2019-12-06 Oppo广东移动通信有限公司 Application awakening method and device, storage medium and electronic equipment
CN110581915A (en) * 2019-08-30 2019-12-17 Oppo广东移动通信有限公司 Stability testing method and device, storage medium and electronic equipment
CN110580897A (en) * 2019-08-23 2019-12-17 Oppo广东移动通信有限公司 audio verification method and device, storage medium and electronic equipment
CN110689887A (en) * 2019-09-24 2020-01-14 Oppo广东移动通信有限公司 Audio verification method and device, storage medium and electronic equipment
CN110968353A (en) * 2019-12-06 2020-04-07 惠州Tcl移动通信有限公司 Central processing unit awakening method and device, voice processor and user equipment
CN111429911A (en) * 2020-03-11 2020-07-17 云知声智能科技股份有限公司 Method and device for reducing power consumption of speech recognition engine in noise scene
CN112291696A (en) * 2019-07-23 2021-01-29 深圳市韶音科技有限公司 Audio chip testing method, storage device and computer equipment
CN112885323A (en) * 2021-02-22 2021-06-01 联想(北京)有限公司 Audio information processing method and device and electronic equipment
CN113160850A (en) * 2021-04-27 2021-07-23 广州国音智能科技有限公司 Audio feature extraction method and device based on re-parameterization decoupling mode
WO2021169711A1 (en) * 2020-02-27 2021-09-02 Oppo广东移动通信有限公司 Instruction execution method and apparatus, storage medium, and electronic device
CN114534476A (en) * 2022-02-22 2022-05-27 新泰市日进化工科技有限公司 Constant temperature control system and control method for tower top of acid drenching device for triazole production

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101828335A (en) * 2007-10-18 2010-09-08 摩托罗拉公司 Robust two microphone noise suppression system
CN101882370A (en) * 2010-06-30 2010-11-10 中山大学 Voice recognition remote controller
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
US8131541B2 (en) * 2008-04-25 2012-03-06 Cambridge Silicon Radio Limited Two microphone noise reduction system
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN103686962A (en) * 2013-12-05 2014-03-26 深圳市中兴移动通信有限公司 Low-power-consumption mobile terminal awakening method and device
US20140172421A1 (en) * 2011-08-10 2014-06-19 Goertek Inc. Speech enhancing method, device for communication earphone and noise reducing communication earphone
CN104598192A (en) * 2014-12-29 2015-05-06 联想(北京)有限公司 Information processing method and electronic equipment
WO2015195482A1 (en) * 2014-06-18 2015-12-23 Cypher, Llc Multi-aural mmse analysis techniques for clarifying audio signals
CN105244031A (en) * 2015-10-26 2016-01-13 北京锐安科技有限公司 Speaker identification method and device
CN105469785A (en) * 2015-11-25 2016-04-06 南京师范大学 Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN105632491A (en) * 2014-11-26 2016-06-01 三星电子株式会社 Method and electronic device for voice recognition
CN105913850A (en) * 2016-04-20 2016-08-31 上海交通大学 Text related vocal print password verification method
CN107464565A (en) * 2017-09-20 2017-12-12 百度在线网络技术(北京)有限公司 A kind of far field voice awakening method and equipment
US20180233158A1 (en) * 2017-02-13 2018-08-16 Knowles Electronics, Llc Soft-talk audio capture for mobile devices
CN108447500A (en) * 2018-04-27 2018-08-24 深圳市沃特沃德股份有限公司 The method and apparatus of speech enhan-cement
US20180330727A1 (en) * 2017-05-10 2018-11-15 Ecobee Inc. Computerized device with voice command input capability

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101828335A (en) * 2007-10-18 2010-09-08 摩托罗拉公司 Robust two microphone noise suppression system
US8131541B2 (en) * 2008-04-25 2012-03-06 Cambridge Silicon Radio Limited Two microphone noise reduction system
CN101882370A (en) * 2010-06-30 2010-11-10 中山大学 Voice recognition remote controller
CN102347027A (en) * 2011-07-07 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
US20140172421A1 (en) * 2011-08-10 2014-06-19 Goertek Inc. Speech enhancing method, device for communication earphone and noise reducing communication earphone
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN103686962A (en) * 2013-12-05 2014-03-26 深圳市中兴移动通信有限公司 Low-power-consumption mobile terminal awakening method and device
WO2015195482A1 (en) * 2014-06-18 2015-12-23 Cypher, Llc Multi-aural mmse analysis techniques for clarifying audio signals
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN105632491A (en) * 2014-11-26 2016-06-01 三星电子株式会社 Method and electronic device for voice recognition
CN104598192A (en) * 2014-12-29 2015-05-06 联想(北京)有限公司 Information processing method and electronic equipment
CN105244031A (en) * 2015-10-26 2016-01-13 北京锐安科技有限公司 Speaker identification method and device
CN105469785A (en) * 2015-11-25 2016-04-06 南京师范大学 Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN105913850A (en) * 2016-04-20 2016-08-31 上海交通大学 Text related vocal print password verification method
US20180233158A1 (en) * 2017-02-13 2018-08-16 Knowles Electronics, Llc Soft-talk audio capture for mobile devices
US20180330727A1 (en) * 2017-05-10 2018-11-15 Ecobee Inc. Computerized device with voice command input capability
CN107464565A (en) * 2017-09-20 2017-12-12 百度在线网络技术(北京)有限公司 A kind of far field voice awakening method and equipment
CN108447500A (en) * 2018-04-27 2018-08-24 深圳市沃特沃德股份有限公司 The method and apparatus of speech enhan-cement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNFENG LI ET AL: "A Two-Microphone Noise Reduction Method in Highly Non-stationary Multiple-Noise-Source Environments", 《IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS,COMMUNICATIONS AND COMPUTER SCIENCES》 *
NISACHON TANGSANGIUMVISAI ET AL: "Two-microphone subband noise reduction scheme with a new noise subtraction parameter for speech quality enhancement", 《IET SIGNAL PROCESSING》 *
张彦芳: "基于双麦克的语音增强算法的研究及应用", 《《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291696A (en) * 2019-07-23 2021-01-29 深圳市韶音科技有限公司 Audio chip testing method, storage device and computer equipment
CN110473554A (en) * 2019-08-08 2019-11-19 Oppo广东移动通信有限公司 Audio method of calibration, device, storage medium and electronic equipment
CN110544468A (en) * 2019-08-23 2019-12-06 Oppo广东移动通信有限公司 Application awakening method and device, storage medium and electronic equipment
CN110580897A (en) * 2019-08-23 2019-12-17 Oppo广东移动通信有限公司 audio verification method and device, storage medium and electronic equipment
CN110581915A (en) * 2019-08-30 2019-12-17 Oppo广东移动通信有限公司 Stability testing method and device, storage medium and electronic equipment
CN110581915B (en) * 2019-08-30 2021-02-19 Oppo广东移动通信有限公司 Stability testing method and device, storage medium and electronic equipment
CN110689887A (en) * 2019-09-24 2020-01-14 Oppo广东移动通信有限公司 Audio verification method and device, storage medium and electronic equipment
CN110689887B (en) * 2019-09-24 2022-04-22 Oppo广东移动通信有限公司 Audio verification method and device, storage medium and electronic equipment
CN110968353A (en) * 2019-12-06 2020-04-07 惠州Tcl移动通信有限公司 Central processing unit awakening method and device, voice processor and user equipment
WO2021169711A1 (en) * 2020-02-27 2021-09-02 Oppo广东移动通信有限公司 Instruction execution method and apparatus, storage medium, and electronic device
CN111429911A (en) * 2020-03-11 2020-07-17 云知声智能科技股份有限公司 Method and device for reducing power consumption of speech recognition engine in noise scene
CN112885323A (en) * 2021-02-22 2021-06-01 联想(北京)有限公司 Audio information processing method and device and electronic equipment
CN113160850A (en) * 2021-04-27 2021-07-23 广州国音智能科技有限公司 Audio feature extraction method and device based on re-parameterization decoupling mode
CN114534476A (en) * 2022-02-22 2022-05-27 新泰市日进化工科技有限公司 Constant temperature control system and control method for tower top of acid drenching device for triazole production
CN114534476B (en) * 2022-02-22 2022-11-01 新泰市日进化工科技有限公司 Constant temperature control system and control method for tower top of acid drenching device for triazole production

Also Published As

Publication number Publication date
CN110021307B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
CN110021307A (en) Audio method of calibration, device, storage medium and electronic equipment
CN108269569B (en) Speech recognition method and device
CN106486131B (en) A kind of method and device of speech de-noising
CN109243491B (en) Method, system and storage medium for emotion recognition of speech in frequency spectrum
CN108369813B (en) Specific voice recognition method, apparatus and storage medium
CN101023469B (en) Digital filtering method, digital filtering equipment
CN109741732B (en) Named entity recognition method, named entity recognition device, equipment and medium
US20190115011A1 (en) Detecting keywords in audio using a spiking neural network
CN110400571B (en) Audio processing method and device, storage medium and electronic equipment
CN109473123A (en) Voice activity detection method and device
WO2018223727A1 (en) Voiceprint recognition method, apparatus and device, and medium
CN110232933B (en) Audio detection method and device, storage medium and electronic equipment
CN109979438A (en) Voice awakening method and electronic equipment
CN108899044A (en) Audio signal processing method and device
CN110473554B (en) Audio verification method and device, storage medium and electronic equipment
WO2020034628A1 (en) Accent identification method and device, computer device, and storage medium
CN108962231B (en) Voice classification method, device, server and storage medium
CN110211599A (en) Using awakening method, device, storage medium and electronic equipment
Rammo et al. Detecting the speaker language using CNN deep learning algorithm
WO2018095167A1 (en) Voiceprint identification method and voiceprint identification system
CN110570870A (en) Text-independent voiceprint recognition method, device and equipment
CN110223687A (en) Instruction executing method, device, storage medium and electronic equipment
CN110268471A (en) The method and apparatus of ASR with embedded noise reduction
CN111540342A (en) Energy threshold adjusting method, device, equipment and medium
CN110570871A (en) TristouNet-based voiceprint recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant