CN110189746A - A kind of method for recognizing speech applied to earth-space communication - Google Patents
A kind of method for recognizing speech applied to earth-space communication Download PDFInfo
- Publication number
- CN110189746A CN110189746A CN201910213205.0A CN201910213205A CN110189746A CN 110189746 A CN110189746 A CN 110189746A CN 201910213205 A CN201910213205 A CN 201910213205A CN 110189746 A CN110189746 A CN 110189746A
- Authority
- CN
- China
- Prior art keywords
- earth
- space communication
- acoustic model
- voice
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006854 communication Effects 0.000 title claims abstract description 39
- 238000004891 communication Methods 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 10
- 239000004568 cement Substances 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 13
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003203 everyday effect Effects 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 230000001965 increasing effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 8
- 238000001914 filtration Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000002708 enhancing effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 206010049712 Dysacusis Diseases 0.000 description 1
- 208000016621 Hearing disease Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention discloses a kind of method for recognizing speech applied to earth-space communication, comprising: establishes air-ground call triphones acoustic model;By improved maximum a posteriori probability voice enhancement algorithm, speech enhan-cement, removal ambient noise processing are carried out to the earth-space communication voice signal to be identified received;It will treated earth-space communication voice signal to be identified, air-ground call triphones acoustic model is inputted to be identified, the voice command text and key words text for identifying controller and pilot carry out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;The key words text identified is detected by keyword spotting model, alarm prompt is carried out when detecting default vocabulary;This method can recognize the voice commands between controlling officer and pilot and be compared, and can also detect sensitive vocabulary and alarm prompt, and can be improved speech recognition rate.
Description
Technical field
The present invention relates to earth-space communication fields, and in particular, to a kind of method for recognizing speech applied to earth-space communication.
Background technique
Earth-space communication is mainly used in the call between controller and pilot, is to ensure that the core of Flight Safety
Point.Since traffic control person works' intensity is big, attention needs high concentration, and mistake is easy in the case where call environment is severe
Understand the speech heard, accidentally so as to cause the traffic control order for issuing mistake, strong influence flight safety.Earth-space communication words
Sound identification technology can with the call between automatic identification controller and pilot, monitor controller and pilot behavior, to by
Danger caused by false command is alerted, and can greatly guarantee flight safety.
Although earth-space communication voice recognition technology is a kind of one of method that flight safety is effectively ensured, but most of at present
Beechnut is not using voice recognition technology, and the talking mode due to earth-space communication is pronouncing, intonation etc.
With particularity, so can not be directly using voice recognition technology general at present.Further, since earth-space communication is by ambient enviroment
It influences, can be interfered with partial noise in communication process, cause air-ground dialogue identification difficulty big.
Current existing general voice recognition technology is not suitable for being applied in beechnut.Due to air-ground call
In pronunciation and grammatically there is its particularity again, need according to its dialogue feature, the intonation etc. that pronounces re-establish one it is proprietary
Air-ground call acoustic model, so there is no the voice recognition technologies for being directed to beechnut currently on the market.
Speech recognition is to need the pure voice signal recorded obtaining acoustic model by training, then again will be wait know
Level signal match with trained acoustic model finally obtaining recognition result by same treatment, if earth-space communication
The sound signal moment by the interference of external environment, can be mingled with many noise signals, these voice signals with noise not only can
Cause dysacusis, causes controlling officer or aircrew to generate auditory fatigue, decreased attention, but also voice signal can be made
Distortion, speech characteristic parameter change, cannot match with acoustic model leads to final recognition result mistake.It is general at present
Solution be identification front end cascade a voice enhancement algorithm to improve voice intellingibility.Specific flow chart such as Fig. 1 institute
Show.
Hidden Markov Model (HMM), Hidden Markov Model is widely used in field of voice signal.One
HMM can be described by carrying out θ={ A, B, M, O, π, F }.Wherein A is the finite aggregate for having N number of state, and B is observation sequence collection, M
It is transfering state probability, it is probability sequence, F is final state sequence that O, which is output observation probability matrix,.Based on hidden Ma Er
The Acoustic Modeling of section husband is output and the initial model that known models are calculated by preceding backward algorithm and recursive algorithm first
The probability of output sequence finally uses dimension by calibrating using Baum Welch algorithm and maximum-likelihood criterion to model
Spy is decoded to obtain recognition result than algorithm.Hidden Markov Model has higher for small vocabulary isolated word speech recognition
Discrimination, but this kind of large vocabulary continuous speech identification of air-ground call is handled, the robustness of identification will be decreased obviously.
Voice enhancement algorithm
Conventional method:
It is mostly at present improved spectrum-subtraction or Wiener filter in general voice enhancement algorithm, although its structure is simply square
Just realize, can elevator belt make an uproar the signal-to-noise ratio of speech, but often introduce other noises, lead to voice distortion.Although this
Method can effectively improve the sense of hearing comfort level of human ear, but not applicable speech recognition front end.
Least-mean-square error algorithm:
Voice enhancement algorithm based on maximum a posteriori probability (Maximum a posteriori, MAP) compares spectrum-subtraction
It with Wiener filtering algorithm, shows as that ambient noise can not only be effectively removed, but also other noise jammings will not be introduced.Assuming that
Signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise.
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of next frame is constantly updated to obtain according to the value of former frame, when the first frame in signal, due to
There is no former frame as reference, so the prior weight of first frame can be calculate by the following formula to obtain:
Wherein a is constant, takes 0.98.
When signal proceeds to the second frame, the calculation formula of prior weight is as follows:
The gain function formula that MAP can be acquired by prior weight and posteriori SNR, finally obtains enhanced speech
Signal:
Although spectrum-subtraction and Wiener filtering are realized simply, excessive " music noise " can be introduced, although signal-to-noise ratio meeting
Have part promotion, but practical auditory effect is not obvious, when noise is relatively low, subtract by spectrum or Wiener filter treated
Voice signal, auditory effect instead can be worse.MAP algorithm is mainly increased by calculating prior weight and posteriori SNR
Beneficial function, and there is estimation problem in prior weight and posteriori SNR, enhanced speech signal amplitude is caused to occur
Change.
Summary of the invention
The present invention provides a kind of method for recognizing speech applied to earth-space communication, can recognize controlling officer and pilot it
Between voice commands and be compared, sensitive vocabulary and alarm prompt can also be detected, and can be improved speech recognition rate.
For achieving the above object, described this application provides a kind of method for recognizing speech applied to earth-space communication
Method includes:
Establish air-ground call triphones GMM-HMM acoustic model;
Sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, passes through improved maximum a posteriori probability language
Sound enhances algorithm, carries out speech enhan-cement, removal ambient noise processing to the earth-space communication voice signal to be identified received;
It will be defeated by improved maximum a posteriori probability voice enhancement algorithm treated earth-space communication voice signal to be identified
Enter air-ground call triphones GMM-HMM acoustic model to be identified, identify controller and pilot voice command text and
Key words text carries out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;Pass through pass
Keyword detection model detects the key words text identified, and alarm prompt is carried out when detecting default vocabulary.
Further, described to establish air-ground call triphones GMM-HMM acoustic model, it specifically includes:
The every-day language data of harvester field earth-space communication;
Feature extraction processing is carried out to collected dialogue data, removes unwanted data;
Audio data after feature extraction is labeled;
By the audio data after mark, air-ground call triphones GMM-HMM acoustic model is obtained by training.
Further, described that feature extraction processing is carried out to collected dialogue data, it specifically includes:
Feature extraction is done using mel-frequency cepstrum coefficient, conversation audio signal is done into Fourier transformation and then calculates dialogue
Audio signal power spectrum obtains:
E (k)=[X (k)]2 (6)
Wherein, E (k) is voice signal power spectrum, and X is voice signal, and k is kth spectral line;
Obtained speech power spectrum is obtained by Mel filter group by weighted sum:
Wherein, S (m) is the value after weighted sum, and L is spectral line number, HmIt (k) is bandpass filter, m is m-th of Mel filtering
Device, M are Mel filter total number;
It takes logarithm then to do discrete cosine transform to obtain:
Wherein, c (n) is the value after discrete cosine transform, and n is nth spectral line after discrete cosine transform.
Further, the audio data after feature extraction is labeled, is specifically included:
Context-sensitive air-ground call triphones GMM-HMM acoustic model is selected, it is poly- to do context by clustering algorithm
Class handles to obtain the cluster set of particular state;Text dictionary is aligned with audio data pressure, passes through Viterbi-beam algorithm
Processing obtains optimal path, obtains optimal frame level and does not mark.
Further, based on the audio data after mark, air-ground call triphones GMM-HMM acoustic model is established, specifically
It include: the call feature according to earth-space communication, using different mute phoneme and non-mute phoneme HMM topological structure, to GMM's
Parameter carries out random initializtion;After carrying out random adjustment integration to Gaussian parameter, iterates and obtain air-ground call triphones
GMM-HMM acoustic model.
Further, air-ground call triphones GMM-HMM acoustic model includes: continuous speech acoustic model and keyword sound
Model is learned, is recognized and converted into content of text output when speech to be identified passes through continuous speech acoustic model after treatment, when
Controller and prompt alarm when inconsistent pilot's Text Command;It is detected whether by keyword acoustic model comprising preset quick
Feel information vocabulary, content of text output and prompt alarm are converted thereof into after recognizing sensitive information vocabulary.
Further, sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, modified gain function is inclined
Difference.
Further, the gain function of sef-adapting filter is shown below:
Assuming that signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein, k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise, and n is a certain moment;
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of first frame is calculate by the following formula to obtain:
Wherein, a is constant, and γ is posteriori SNR;
When signal proceeds to the second frame, the calculation formula of prior weight are as follows:
Wherein,For the pure voice signal estimated;δdIt (k) is noise power;&SNR (k, τ) is the signal-to-noise ratio estimated;
It brings formula (9) into formula (3) and (4), obtains improved prior weight are as follows:
Wherein, Gw(k, τ) is current time sef-adapting filter value;Gw(k, τ -1) is previous moment sef-adapting filter
Value.
The present invention provides a kind of suitable for earth-space communication according to the grammer pronunciation characteristic and noise circumstance of air-ground call
The method for recognizing speech of system.This method has built the acoustic model of air-ground call term, can recognize controlling officer and pilot
Between voice commands and be compared, sensitive vocabulary and alarm prompt can also be detected;In conjunction with the noise circumstance of earth-space communication
A kind of speech enhancing algorithm of adaptive-filtering is provided to improve speech recognition rate.This method is broadly divided into two parts: (1) root
Triphones GMM-HMM acoustic model is established according to the characteristics of air-ground call, can recognize than voice content and detects sensitive information.
(2) noise circumstance for combining air-ground call, is added sef-adapting filter in MAP algorithm, by continuing to optimize parameter, is removing
Also make the characteristic parameter of enhanced voice signal that biggish change not occur while ambient noise.
The speech feature and noise circumstance of present invention combination earth-space communication, establish air-ground call acoustic model, the identification mould
The speech content of the recognizable controller of type and pilot are simultaneously compared, can alarm prompt when ordering inconsistent;Pass through key
Word detects model, when detecting the high-risk sensitive vocabulary set system also can alarm prompt, guarantee flight safety;Using certainly
Adaptive filtering algorithm does enhancing processing to sound if to be identified, reduces ambient noise contained by speech to be identified, improves to be identified
The intelligibility of speech makes speech to be identified have higher discrimination at identification end.
One or more technical solution provided by the present application, has at least the following technical effects or advantages:
The present invention establishes earth-space communication speech recognition model, this method can recognize and compare for geocosmic flight safety
Whether the voice commands between controlling officer and pilot are consistent, can also detect preset sensitive vocabulary and alarm mentions
Show, so as to improve flight safety.
Optimize existing MAP algorithm, further increases the reinforcing effect of the algorithm by increasing sef-adapting filter.It should
Sef-adapting filter main function is as follows: being promoted in the low signal-to-noise ratio section less than -15dB by introducing modified gain function
Intelligibility limits amplitude spectrum being greater than the section 10dB, reduces amplification distortion.So as to improve the discrimination of air-ground call,
Guarantee that voice recognition system has higher robustness under adverse noise environment.
Present invention is mainly applied in earth-space communication voice recognition system, the present invention is compared with the existing technology for improving ground
The speech discrimination of sky call, guarantees that flight safety has better effect.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention;
Fig. 1 is the flow diagram for improving voice intellingibility method by voice enhancement algorithm in the prior art;
Fig. 2 is speech recognition algorithm flow schematic diagram in the application;
Fig. 3 is voice enhancement algorithm flow diagram in the application.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real
Applying mode, the present invention is further described in detail.It should be noted that in the case where not conflicting mutually, the application's
Feature in embodiment and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also
Implemented with being different from the other modes being described herein in range using other, therefore, protection scope of the present invention is not by under
The limitation of specific embodiment disclosed in face.
The present invention is divided into two parts, respectively speech recognition end and enhancing end.
1 identification end
Fig. 2 is the speech recognizer flow chart in the embodiment of the present invention, and detailed process is as follows:
(1) data used by acoustic model of the present invention is established are that the every-day language of certain domestic airport earth-space communication is mould
Plate, and approach tower controlling officer is engaged to record according to daily call rule.Wherein male and female students ratio is 2:1, audio sample rate
For 16KHz, sampling precision is 16, and recording audio total capacity is 10G.
(2) feature extraction, since collected data include many redundancies, so needing to the useful letter in data
Breath carries out feature extraction to reduce unnecessary calculating, and this patent does feature extraction using mel-frequency cepstrum coefficient.First will
Signal does Fourier transformation and then calculates its power spectrum and obtains:
E (k)=[X (k)]2 (6)
Mel filter group is passed it through to obtain by weighted sum:
It finally takes logarithm then to do discrete cosine transform to obtain:
(3) audio data marks.Can there are identical pronunciation but the different feelings of word meaning in the identification of large vocabulary continuous speech
Condition causes current phoneme to be influenced by front and back phoneme, the characteristic parameter before and after continuous speech cannot be calculated well, so generally
Context-sensitive phoneme model is selected, context clustering processing is then done by clustering algorithm and obtains the cluster of particular state
Collection.Text dictionary is aligned with audio data pressure first, optimal path is obtained by Viterbi-beam algorithm process, finally
It can be obtained by optimal frame level not mark.
(4) air-ground call triphones GMM-HMM acoustic model is established.According to the call feature of earth-space communication, using difference
Mute phoneme and non-mute phoneme HMM topological structure, random initializtion is carried out to the parameter of GMM.To Gaussian parameter carry out with
After machine adjustment integration, iterate final triphones GMM-HMM acoustic model.
Acoustic model 1 is continuous speech acoustic model in Fig. 2, and acoustic model 2 is keyword acoustic model, when words to be identified
Sound can be recognized and converted into content of text output by acoustic model 1 after treatment, as controller and pilot's Text Command
Prompt alarm when inconsistent;It can also be detected whether by acoustic model 2 comprising preset sensitive information vocabulary, when recognizing sensitivity
Content of text output and prompt alarm are converted thereof into after information vocabulary.
2 enhancing ends
Fig. 3 is the voice enhancement algorithm flow chart in the embodiment of the present invention, and the present invention mainly passes through addition adaptive-filtering
Device removes ambient noise and improves voice intellingibility.
Sef-adapting filter, modified gain function deviation is added.According to formula (4) as can be seen that the priori letter of next frame
It makes an uproar than being updated according to former frame, since the prior weight being currently calculated is not very accurate, this is resulted in by working as
The estimated value for the next frame prior weight that preceding prior weight is calculated may be excessive or too small, to influence speech enhan-cement
Performance.For such situation, a sef-adapting filter is added in formula (3), in (4) to adjust different signal-to-noise ratio areas in the present invention
Between prior weight estimation range
It is debugged by simulating, verifying and engineering, it is determined that the gain function of sef-adapting filter is shown below:
Bring formula (10) into formula (3), (4) it is as follows to obtain improved prior weight:
By formula (6) it is found that the gain function of sef-adapting filter makes tune for the section of three different signal-to-noise ratio
It is whole, when the signal-to-noise ratio for calculating kth frame τ point is less than -15db, it is believed that this frequency point is mainly noise signal, is led to
Introducing amendment deviation is crossed to remove noise jamming.When signal-to-noise ratio is greater than 10, the phonetic element in signal is much larger than and makes an uproar at this time
At this moment acoustical signal sets thresholding and guarantees that this signal is allowed not introduce excessive gain compensation as 0.8, sends out the output amplitude of signal not
Raw biggish change.When signal-to-noise ratio is at -15 to 10 this section, voice signal and noise signal energy Relative Fuzzy, at this moment
It needs to further discriminate between the noise contribution in signal by sef-adapting filter, so needing to increase in the gain function in this section
One thresholding prevents gain function value from being less than this thresholding.It is emulated by many experiments, when taking 0.8, effect is best.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (8)
1. a kind of method for recognizing speech applied to earth-space communication, which is characterized in that the described method includes:
Establish air-ground call triphones GMM-HMM acoustic model;
Sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, is increased by improved maximum a posteriori probability voice
Strong algorithms carry out speech enhan-cement, removal ambient noise processing to the earth-space communication voice signal to be identified received;
Improved maximum a posteriori probability voice enhancement algorithm treated earth-space communication voice signal to be identified, input ground will be passed through
Sky call triphones GMM-HMM acoustic model is identified, identifies the voice command text and key of controller and pilot
Word text carries out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;Pass through keyword
Detection model detects the key words text identified, and alarm prompt is carried out when detecting default vocabulary.
2. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that it is described establish it is air-ground
Call triphones GMM-HMM acoustic model, specifically includes:
The every-day language data of harvester field earth-space communication;
Feature extraction processing is carried out to collected dialogue data, removes unwanted data;
Audio data after feature extraction is labeled;
By the audio data after mark, air-ground call triphones GMM-HMM acoustic model is obtained by training.
3. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that described pair collects
Dialogue data carry out feature extraction processing, specifically include:
Feature extraction is done using mel-frequency cepstrum coefficient, conversation audio signal is done into Fourier transformation and then calculates conversation audio
Power spectrum signal obtains:
E (k)=[X (k)]2 (6)
Wherein, E (k) is voice signal power spectrum, and X is voice signal, and k is kth spectral line;
Obtained speech power spectrum is obtained by Mel filter group by weighted sum:
Wherein, S (m) is the value after weighted sum, and L is spectral line number, HmIt (k) is bandpass filter, m is m-th of Mel filter, M
For Mel filter total number;
It takes logarithm then to do discrete cosine transform to obtain:
Wherein, c (n) is the value after discrete cosine transform, and n is nth spectral line after discrete cosine transform.
4. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that after feature extraction
Audio data be labeled, specifically include:
Context-sensitive air-ground call triphones GMM-HMM acoustic model is selected, is done at context cluster by clustering algorithm
Reason obtains the cluster set of particular state;Text dictionary is aligned with audio data pressure, passes through Viterbi-beam algorithm process
Optimal path is obtained, optimal frame level is obtained and does not mark.
5. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that after mark
Audio data is established air-ground call triphones GMM-HMM acoustic model, is specifically included: according to the call feature of earth-space communication, adopting
With different mute phoneme and non-mute phoneme HMM topological structure, random initializtion is carried out to the parameter of GMM;To Gaussian parameter
After carrying out random adjustment integration, iterates and obtain air-ground call triphones GMM-HMM acoustic model.
6. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that three sounds of air-ground call
Plain GMM-HMM acoustic model includes: continuous speech acoustic model and keyword acoustic model, when speech to be identified after treatment
It is recognized and converted into content of text output by continuous speech acoustic model, as controller and inconsistent pilot's Text Command
Prompt alarm;It is detected whether by keyword acoustic model comprising preset sensitive information vocabulary, when recognizing sensitive information word
Content of text output and prompt alarm are converted thereof into after remittance.
7. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that general in maximum a posteriori
Sef-adapting filter, modified gain function deviation are added in rate voice enhancement algorithm.
8. the method for recognizing speech according to claim 7 applied to earth-space communication, which is characterized in that sef-adapting filter
Gain function be shown below:
Assuming that signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein, k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise, and n is a certain moment;
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of first frame is calculate by the following formula to obtain:
Wherein, a is constant, and γ is posteriori SNR;
When signal proceeds to the second frame, the calculation formula of prior weight are as follows:
Wherein,For the pure voice signal estimated;δdIt (k) is noise power;&SNR (k, τ) is the signal-to-noise ratio estimated;
It brings formula (9) into formula (3) and (4), obtains improved prior weight are as follows:
Wherein, Gw(k, τ) is current time sef-adapting filter value;Gw(k, τ -1) is previous moment sef-adapting filter value.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910213205.0A CN110189746B (en) | 2019-03-20 | 2019-03-20 | Voice recognition method applied to ground-air communication |
PCT/CN2019/111789 WO2020186742A1 (en) | 2019-03-20 | 2019-10-18 | Voice recognition method applied to ground-air communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910213205.0A CN110189746B (en) | 2019-03-20 | 2019-03-20 | Voice recognition method applied to ground-air communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110189746A true CN110189746A (en) | 2019-08-30 |
CN110189746B CN110189746B (en) | 2021-06-11 |
Family
ID=67713727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910213205.0A Active CN110189746B (en) | 2019-03-20 | 2019-03-20 | Voice recognition method applied to ground-air communication |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110189746B (en) |
WO (1) | WO2020186742A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689906A (en) * | 2019-11-05 | 2020-01-14 | 江苏网进科技股份有限公司 | Law enforcement detection method and system based on voice processing technology |
CN111667830A (en) * | 2020-06-08 | 2020-09-15 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
WO2020186742A1 (en) * | 2019-03-20 | 2020-09-24 | 成都天奥信息科技有限公司 | Voice recognition method applied to ground-air communication |
CN112309403A (en) * | 2020-03-05 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN113129919A (en) * | 2021-04-17 | 2021-07-16 | 上海麦图信息科技有限公司 | Air control voice noise reduction method based on deep learning |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916565A (en) * | 2010-06-24 | 2010-12-15 | 北京华安天诚科技有限公司 | Voice recognition method and voice recognition device in air traffic control system |
CN102074246B (en) * | 2011-01-05 | 2012-12-19 | 瑞声声学科技(深圳)有限公司 | Dual-microphone based speech enhancement device and method |
US20160077523A1 (en) * | 2013-07-22 | 2016-03-17 | Sikorsky Aircraft Corporation | System for controlling and communicating with aircraft |
FR3010809B1 (en) * | 2013-09-18 | 2017-05-19 | Airbus Operations Sas | METHOD AND DEVICE FOR AUTOMATIC MANAGEMENT ON BOARD AN AIRCRAFT AUDIO MESSAGE AIRCRAFT. |
US20150162001A1 (en) * | 2013-12-10 | 2015-06-11 | Honeywell International Inc. | System and method for textually and graphically presenting air traffic control voice information |
CN106297796A (en) * | 2016-03-25 | 2017-01-04 | 李克军 | A kind of pilot rehearses monitoring method and device |
CN106875948B (en) * | 2017-02-22 | 2019-10-29 | 中国电子科技集团公司第二十八研究所 | A kind of collision alert method based on control voice |
CN108986791B (en) * | 2018-08-10 | 2021-01-05 | 南京航空航天大学 | Chinese and English language voice recognition method and system for civil aviation air-land communication field |
CN109119072A (en) * | 2018-09-28 | 2019-01-01 | 中国民航大学 | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM |
CN109087657B (en) * | 2018-10-17 | 2021-09-14 | 成都天奥信息科技有限公司 | Voice enhancement method applied to ultra-short wave radio station |
CN110189746B (en) * | 2019-03-20 | 2021-06-11 | 成都天奥信息科技有限公司 | Voice recognition method applied to ground-air communication |
-
2019
- 2019-03-20 CN CN201910213205.0A patent/CN110189746B/en active Active
- 2019-10-18 WO PCT/CN2019/111789 patent/WO2020186742A1/en active Application Filing
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020186742A1 (en) * | 2019-03-20 | 2020-09-24 | 成都天奥信息科技有限公司 | Voice recognition method applied to ground-air communication |
CN110689906A (en) * | 2019-11-05 | 2020-01-14 | 江苏网进科技股份有限公司 | Law enforcement detection method and system based on voice processing technology |
CN112309403A (en) * | 2020-03-05 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN111667830A (en) * | 2020-06-08 | 2020-09-15 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
WO2021249284A1 (en) * | 2020-06-08 | 2021-12-16 | 中国民航大学 | Airport control decision support system and method based on semantic recognition of controller instruction |
CN111667830B (en) * | 2020-06-08 | 2022-04-29 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
CN113129919A (en) * | 2021-04-17 | 2021-07-16 | 上海麦图信息科技有限公司 | Air control voice noise reduction method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN110189746B (en) | 2021-06-11 |
WO2020186742A1 (en) | 2020-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110189746A (en) | A kind of method for recognizing speech applied to earth-space communication | |
Mitra et al. | Medium-duration modulation cepstral feature for robust speech recognition | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
WO1998038632A1 (en) | Method and system for establishing handset-dependent normalizing models for speaker recognition | |
CN106023986B (en) | A kind of audio recognition method based on sound effect mode detection | |
CN107039035A (en) | A kind of detection method of voice starting point and ending point | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
Venkatesan et al. | Binaural classification-based speech segregation and robust speaker recognition system | |
Maganti et al. | Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms | |
Kamble et al. | Emotion recognition for instantaneous Marathi spoken words | |
Zhu et al. | Log-energy dynamic range normalization for robust speech recognition | |
Wang et al. | Robust speech recognition from ratio masks | |
Mehta et al. | Robust front-end and back-end processing for feature extraction for Hindi speech recognition | |
Singh et al. | A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters | |
Chen et al. | Robust MFCCs derived from differentiated power spectrum | |
CN107039046B (en) | Voice sound effect mode detection method based on feature fusion | |
Singh et al. | A novel algorithm using MFCC and ERB gammatone filters in speech recognition | |
Shahrul Azmi et al. | Noise robustness of Spectrum Delta (SpD) features in Malay vowel recognition | |
CN106448680B (en) | A kind of missing data feature method for distinguishing speek person using perception auditory scene analysis | |
Morales et al. | Adding noise to improve noise robustness in speech recognition. | |
Chandra | Hindi vowel classification using QCN-PNCC features | |
Sailaja et al. | Text independent speaker identification with finite multivariate generalized gaussian mixture model and hierarchical clustering algorithm | |
Seyedin et al. | A new subband-weighted MVDR-based front-end for robust speech recognition | |
Fukuda et al. | Phone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection. | |
Das et al. | Integrating denoising autoencoder and vector Taylor series with auditory masking for speech recognition in noisy conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |