CN110189746A - A kind of method for recognizing speech applied to earth-space communication - Google Patents

A kind of method for recognizing speech applied to earth-space communication Download PDF

Info

Publication number
CN110189746A
CN110189746A CN201910213205.0A CN201910213205A CN110189746A CN 110189746 A CN110189746 A CN 110189746A CN 201910213205 A CN201910213205 A CN 201910213205A CN 110189746 A CN110189746 A CN 110189746A
Authority
CN
China
Prior art keywords
earth
space communication
acoustic model
voice
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910213205.0A
Other languages
Chinese (zh)
Other versions
CN110189746B (en
Inventor
姚元飞
王群
陈洪瑀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Spaceon Technology Co Ltd
Original Assignee
Chengdu Spaceon Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Spaceon Technology Co Ltd filed Critical Chengdu Spaceon Technology Co Ltd
Priority to CN201910213205.0A priority Critical patent/CN110189746B/en
Publication of CN110189746A publication Critical patent/CN110189746A/en
Priority to PCT/CN2019/111789 priority patent/WO2020186742A1/en
Application granted granted Critical
Publication of CN110189746B publication Critical patent/CN110189746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a kind of method for recognizing speech applied to earth-space communication, comprising: establishes air-ground call triphones acoustic model;By improved maximum a posteriori probability voice enhancement algorithm, speech enhan-cement, removal ambient noise processing are carried out to the earth-space communication voice signal to be identified received;It will treated earth-space communication voice signal to be identified, air-ground call triphones acoustic model is inputted to be identified, the voice command text and key words text for identifying controller and pilot carry out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;The key words text identified is detected by keyword spotting model, alarm prompt is carried out when detecting default vocabulary;This method can recognize the voice commands between controlling officer and pilot and be compared, and can also detect sensitive vocabulary and alarm prompt, and can be improved speech recognition rate.

Description

A kind of method for recognizing speech applied to earth-space communication
Technical field
The present invention relates to earth-space communication fields, and in particular, to a kind of method for recognizing speech applied to earth-space communication.
Background technique
Earth-space communication is mainly used in the call between controller and pilot, is to ensure that the core of Flight Safety Point.Since traffic control person works' intensity is big, attention needs high concentration, and mistake is easy in the case where call environment is severe Understand the speech heard, accidentally so as to cause the traffic control order for issuing mistake, strong influence flight safety.Earth-space communication words Sound identification technology can with the call between automatic identification controller and pilot, monitor controller and pilot behavior, to by Danger caused by false command is alerted, and can greatly guarantee flight safety.
Although earth-space communication voice recognition technology is a kind of one of method that flight safety is effectively ensured, but most of at present Beechnut is not using voice recognition technology, and the talking mode due to earth-space communication is pronouncing, intonation etc. With particularity, so can not be directly using voice recognition technology general at present.Further, since earth-space communication is by ambient enviroment It influences, can be interfered with partial noise in communication process, cause air-ground dialogue identification difficulty big.
Current existing general voice recognition technology is not suitable for being applied in beechnut.Due to air-ground call In pronunciation and grammatically there is its particularity again, need according to its dialogue feature, the intonation etc. that pronounces re-establish one it is proprietary Air-ground call acoustic model, so there is no the voice recognition technologies for being directed to beechnut currently on the market.
Speech recognition is to need the pure voice signal recorded obtaining acoustic model by training, then again will be wait know Level signal match with trained acoustic model finally obtaining recognition result by same treatment, if earth-space communication The sound signal moment by the interference of external environment, can be mingled with many noise signals, these voice signals with noise not only can Cause dysacusis, causes controlling officer or aircrew to generate auditory fatigue, decreased attention, but also voice signal can be made Distortion, speech characteristic parameter change, cannot match with acoustic model leads to final recognition result mistake.It is general at present Solution be identification front end cascade a voice enhancement algorithm to improve voice intellingibility.Specific flow chart such as Fig. 1 institute Show.
Hidden Markov Model (HMM), Hidden Markov Model is widely used in field of voice signal.One HMM can be described by carrying out θ={ A, B, M, O, π, F }.Wherein A is the finite aggregate for having N number of state, and B is observation sequence collection, M It is transfering state probability, it is probability sequence, F is final state sequence that O, which is output observation probability matrix,.Based on hidden Ma Er The Acoustic Modeling of section husband is output and the initial model that known models are calculated by preceding backward algorithm and recursive algorithm first The probability of output sequence finally uses dimension by calibrating using Baum Welch algorithm and maximum-likelihood criterion to model Spy is decoded to obtain recognition result than algorithm.Hidden Markov Model has higher for small vocabulary isolated word speech recognition Discrimination, but this kind of large vocabulary continuous speech identification of air-ground call is handled, the robustness of identification will be decreased obviously.
Voice enhancement algorithm
Conventional method:
It is mostly at present improved spectrum-subtraction or Wiener filter in general voice enhancement algorithm, although its structure is simply square Just realize, can elevator belt make an uproar the signal-to-noise ratio of speech, but often introduce other noises, lead to voice distortion.Although this Method can effectively improve the sense of hearing comfort level of human ear, but not applicable speech recognition front end.
Least-mean-square error algorithm:
Voice enhancement algorithm based on maximum a posteriori probability (Maximum a posteriori, MAP) compares spectrum-subtraction It with Wiener filtering algorithm, shows as that ambient noise can not only be effectively removed, but also other noise jammings will not be introduced.Assuming that Signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise.
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of next frame is constantly updated to obtain according to the value of former frame, when the first frame in signal, due to There is no former frame as reference, so the prior weight of first frame can be calculate by the following formula to obtain:
Wherein a is constant, takes 0.98.
When signal proceeds to the second frame, the calculation formula of prior weight is as follows:
The gain function formula that MAP can be acquired by prior weight and posteriori SNR, finally obtains enhanced speech Signal:
Although spectrum-subtraction and Wiener filtering are realized simply, excessive " music noise " can be introduced, although signal-to-noise ratio meeting Have part promotion, but practical auditory effect is not obvious, when noise is relatively low, subtract by spectrum or Wiener filter treated Voice signal, auditory effect instead can be worse.MAP algorithm is mainly increased by calculating prior weight and posteriori SNR Beneficial function, and there is estimation problem in prior weight and posteriori SNR, enhanced speech signal amplitude is caused to occur Change.
Summary of the invention
The present invention provides a kind of method for recognizing speech applied to earth-space communication, can recognize controlling officer and pilot it Between voice commands and be compared, sensitive vocabulary and alarm prompt can also be detected, and can be improved speech recognition rate.
For achieving the above object, described this application provides a kind of method for recognizing speech applied to earth-space communication Method includes:
Establish air-ground call triphones GMM-HMM acoustic model;
Sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, passes through improved maximum a posteriori probability language Sound enhances algorithm, carries out speech enhan-cement, removal ambient noise processing to the earth-space communication voice signal to be identified received;
It will be defeated by improved maximum a posteriori probability voice enhancement algorithm treated earth-space communication voice signal to be identified Enter air-ground call triphones GMM-HMM acoustic model to be identified, identify controller and pilot voice command text and Key words text carries out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;Pass through pass Keyword detection model detects the key words text identified, and alarm prompt is carried out when detecting default vocabulary.
Further, described to establish air-ground call triphones GMM-HMM acoustic model, it specifically includes:
The every-day language data of harvester field earth-space communication;
Feature extraction processing is carried out to collected dialogue data, removes unwanted data;
Audio data after feature extraction is labeled;
By the audio data after mark, air-ground call triphones GMM-HMM acoustic model is obtained by training.
Further, described that feature extraction processing is carried out to collected dialogue data, it specifically includes:
Feature extraction is done using mel-frequency cepstrum coefficient, conversation audio signal is done into Fourier transformation and then calculates dialogue Audio signal power spectrum obtains:
E (k)=[X (k)]2 (6)
Wherein, E (k) is voice signal power spectrum, and X is voice signal, and k is kth spectral line;
Obtained speech power spectrum is obtained by Mel filter group by weighted sum:
Wherein, S (m) is the value after weighted sum, and L is spectral line number, HmIt (k) is bandpass filter, m is m-th of Mel filtering Device, M are Mel filter total number;
It takes logarithm then to do discrete cosine transform to obtain:
Wherein, c (n) is the value after discrete cosine transform, and n is nth spectral line after discrete cosine transform.
Further, the audio data after feature extraction is labeled, is specifically included:
Context-sensitive air-ground call triphones GMM-HMM acoustic model is selected, it is poly- to do context by clustering algorithm Class handles to obtain the cluster set of particular state;Text dictionary is aligned with audio data pressure, passes through Viterbi-beam algorithm Processing obtains optimal path, obtains optimal frame level and does not mark.
Further, based on the audio data after mark, air-ground call triphones GMM-HMM acoustic model is established, specifically It include: the call feature according to earth-space communication, using different mute phoneme and non-mute phoneme HMM topological structure, to GMM's Parameter carries out random initializtion;After carrying out random adjustment integration to Gaussian parameter, iterates and obtain air-ground call triphones GMM-HMM acoustic model.
Further, air-ground call triphones GMM-HMM acoustic model includes: continuous speech acoustic model and keyword sound Model is learned, is recognized and converted into content of text output when speech to be identified passes through continuous speech acoustic model after treatment, when Controller and prompt alarm when inconsistent pilot's Text Command;It is detected whether by keyword acoustic model comprising preset quick Feel information vocabulary, content of text output and prompt alarm are converted thereof into after recognizing sensitive information vocabulary.
Further, sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, modified gain function is inclined Difference.
Further, the gain function of sef-adapting filter is shown below:
Assuming that signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein, k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise, and n is a certain moment;
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of first frame is calculate by the following formula to obtain:
Wherein, a is constant, and γ is posteriori SNR;
When signal proceeds to the second frame, the calculation formula of prior weight are as follows:
Wherein,For the pure voice signal estimated;δdIt (k) is noise power;&SNR (k, τ) is the signal-to-noise ratio estimated;
It brings formula (9) into formula (3) and (4), obtains improved prior weight are as follows:
Wherein, Gw(k, τ) is current time sef-adapting filter value;Gw(k, τ -1) is previous moment sef-adapting filter Value.
The present invention provides a kind of suitable for earth-space communication according to the grammer pronunciation characteristic and noise circumstance of air-ground call The method for recognizing speech of system.This method has built the acoustic model of air-ground call term, can recognize controlling officer and pilot Between voice commands and be compared, sensitive vocabulary and alarm prompt can also be detected;In conjunction with the noise circumstance of earth-space communication A kind of speech enhancing algorithm of adaptive-filtering is provided to improve speech recognition rate.This method is broadly divided into two parts: (1) root Triphones GMM-HMM acoustic model is established according to the characteristics of air-ground call, can recognize than voice content and detects sensitive information. (2) noise circumstance for combining air-ground call, is added sef-adapting filter in MAP algorithm, by continuing to optimize parameter, is removing Also make the characteristic parameter of enhanced voice signal that biggish change not occur while ambient noise.
The speech feature and noise circumstance of present invention combination earth-space communication, establish air-ground call acoustic model, the identification mould The speech content of the recognizable controller of type and pilot are simultaneously compared, can alarm prompt when ordering inconsistent;Pass through key Word detects model, when detecting the high-risk sensitive vocabulary set system also can alarm prompt, guarantee flight safety;Using certainly Adaptive filtering algorithm does enhancing processing to sound if to be identified, reduces ambient noise contained by speech to be identified, improves to be identified The intelligibility of speech makes speech to be identified have higher discrimination at identification end.
One or more technical solution provided by the present application, has at least the following technical effects or advantages:
The present invention establishes earth-space communication speech recognition model, this method can recognize and compare for geocosmic flight safety Whether the voice commands between controlling officer and pilot are consistent, can also detect preset sensitive vocabulary and alarm mentions Show, so as to improve flight safety.
Optimize existing MAP algorithm, further increases the reinforcing effect of the algorithm by increasing sef-adapting filter.It should Sef-adapting filter main function is as follows: being promoted in the low signal-to-noise ratio section less than -15dB by introducing modified gain function Intelligibility limits amplitude spectrum being greater than the section 10dB, reduces amplification distortion.So as to improve the discrimination of air-ground call, Guarantee that voice recognition system has higher robustness under adverse noise environment.
Present invention is mainly applied in earth-space communication voice recognition system, the present invention is compared with the existing technology for improving ground The speech discrimination of sky call, guarantees that flight safety has better effect.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention;
Fig. 1 is the flow diagram for improving voice intellingibility method by voice enhancement algorithm in the prior art;
Fig. 2 is speech recognition algorithm flow schematic diagram in the application;
Fig. 3 is voice enhancement algorithm flow diagram in the application.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying mode, the present invention is further described in detail.It should be noted that in the case where not conflicting mutually, the application's Feature in embodiment and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also Implemented with being different from the other modes being described herein in range using other, therefore, protection scope of the present invention is not by under The limitation of specific embodiment disclosed in face.
The present invention is divided into two parts, respectively speech recognition end and enhancing end.
1 identification end
Fig. 2 is the speech recognizer flow chart in the embodiment of the present invention, and detailed process is as follows:
(1) data used by acoustic model of the present invention is established are that the every-day language of certain domestic airport earth-space communication is mould Plate, and approach tower controlling officer is engaged to record according to daily call rule.Wherein male and female students ratio is 2:1, audio sample rate For 16KHz, sampling precision is 16, and recording audio total capacity is 10G.
(2) feature extraction, since collected data include many redundancies, so needing to the useful letter in data Breath carries out feature extraction to reduce unnecessary calculating, and this patent does feature extraction using mel-frequency cepstrum coefficient.First will Signal does Fourier transformation and then calculates its power spectrum and obtains:
E (k)=[X (k)]2 (6)
Mel filter group is passed it through to obtain by weighted sum:
It finally takes logarithm then to do discrete cosine transform to obtain:
(3) audio data marks.Can there are identical pronunciation but the different feelings of word meaning in the identification of large vocabulary continuous speech Condition causes current phoneme to be influenced by front and back phoneme, the characteristic parameter before and after continuous speech cannot be calculated well, so generally Context-sensitive phoneme model is selected, context clustering processing is then done by clustering algorithm and obtains the cluster of particular state Collection.Text dictionary is aligned with audio data pressure first, optimal path is obtained by Viterbi-beam algorithm process, finally It can be obtained by optimal frame level not mark.
(4) air-ground call triphones GMM-HMM acoustic model is established.According to the call feature of earth-space communication, using difference Mute phoneme and non-mute phoneme HMM topological structure, random initializtion is carried out to the parameter of GMM.To Gaussian parameter carry out with After machine adjustment integration, iterate final triphones GMM-HMM acoustic model.
Acoustic model 1 is continuous speech acoustic model in Fig. 2, and acoustic model 2 is keyword acoustic model, when words to be identified Sound can be recognized and converted into content of text output by acoustic model 1 after treatment, as controller and pilot's Text Command Prompt alarm when inconsistent;It can also be detected whether by acoustic model 2 comprising preset sensitive information vocabulary, when recognizing sensitivity Content of text output and prompt alarm are converted thereof into after information vocabulary.
2 enhancing ends
Fig. 3 is the voice enhancement algorithm flow chart in the embodiment of the present invention, and the present invention mainly passes through addition adaptive-filtering Device removes ambient noise and improves voice intellingibility.
Sef-adapting filter, modified gain function deviation is added.According to formula (4) as can be seen that the priori letter of next frame It makes an uproar than being updated according to former frame, since the prior weight being currently calculated is not very accurate, this is resulted in by working as The estimated value for the next frame prior weight that preceding prior weight is calculated may be excessive or too small, to influence speech enhan-cement Performance.For such situation, a sef-adapting filter is added in formula (3), in (4) to adjust different signal-to-noise ratio areas in the present invention Between prior weight estimation range
It is debugged by simulating, verifying and engineering, it is determined that the gain function of sef-adapting filter is shown below:
Bring formula (10) into formula (3), (4) it is as follows to obtain improved prior weight:
By formula (6) it is found that the gain function of sef-adapting filter makes tune for the section of three different signal-to-noise ratio It is whole, when the signal-to-noise ratio for calculating kth frame τ point is less than -15db, it is believed that this frequency point is mainly noise signal, is led to Introducing amendment deviation is crossed to remove noise jamming.When signal-to-noise ratio is greater than 10, the phonetic element in signal is much larger than and makes an uproar at this time At this moment acoustical signal sets thresholding and guarantees that this signal is allowed not introduce excessive gain compensation as 0.8, sends out the output amplitude of signal not Raw biggish change.When signal-to-noise ratio is at -15 to 10 this section, voice signal and noise signal energy Relative Fuzzy, at this moment It needs to further discriminate between the noise contribution in signal by sef-adapting filter, so needing to increase in the gain function in this section One thresholding prevents gain function value from being less than this thresholding.It is emulated by many experiments, when taking 0.8, effect is best.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (8)

1. a kind of method for recognizing speech applied to earth-space communication, which is characterized in that the described method includes:
Establish air-ground call triphones GMM-HMM acoustic model;
Sef-adapting filter is added in maximum a posteriori probability voice enhancement algorithm, is increased by improved maximum a posteriori probability voice Strong algorithms carry out speech enhan-cement, removal ambient noise processing to the earth-space communication voice signal to be identified received;
Improved maximum a posteriori probability voice enhancement algorithm treated earth-space communication voice signal to be identified, input ground will be passed through Sky call triphones GMM-HMM acoustic model is identified, identifies the voice command text and key of controller and pilot Word text carries out alarm prompt when the voice command text of the controller and pilot that identify are inconsistent;Pass through keyword Detection model detects the key words text identified, and alarm prompt is carried out when detecting default vocabulary.
2. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that it is described establish it is air-ground Call triphones GMM-HMM acoustic model, specifically includes:
The every-day language data of harvester field earth-space communication;
Feature extraction processing is carried out to collected dialogue data, removes unwanted data;
Audio data after feature extraction is labeled;
By the audio data after mark, air-ground call triphones GMM-HMM acoustic model is obtained by training.
3. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that described pair collects Dialogue data carry out feature extraction processing, specifically include:
Feature extraction is done using mel-frequency cepstrum coefficient, conversation audio signal is done into Fourier transformation and then calculates conversation audio Power spectrum signal obtains:
E (k)=[X (k)]2 (6)
Wherein, E (k) is voice signal power spectrum, and X is voice signal, and k is kth spectral line;
Obtained speech power spectrum is obtained by Mel filter group by weighted sum:
Wherein, S (m) is the value after weighted sum, and L is spectral line number, HmIt (k) is bandpass filter, m is m-th of Mel filter, M For Mel filter total number;
It takes logarithm then to do discrete cosine transform to obtain:
Wherein, c (n) is the value after discrete cosine transform, and n is nth spectral line after discrete cosine transform.
4. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that after feature extraction Audio data be labeled, specifically include:
Context-sensitive air-ground call triphones GMM-HMM acoustic model is selected, is done at context cluster by clustering algorithm Reason obtains the cluster set of particular state;Text dictionary is aligned with audio data pressure, passes through Viterbi-beam algorithm process Optimal path is obtained, optimal frame level is obtained and does not mark.
5. the method for recognizing speech according to claim 2 applied to earth-space communication, which is characterized in that after mark Audio data is established air-ground call triphones GMM-HMM acoustic model, is specifically included: according to the call feature of earth-space communication, adopting With different mute phoneme and non-mute phoneme HMM topological structure, random initializtion is carried out to the parameter of GMM;To Gaussian parameter After carrying out random adjustment integration, iterates and obtain air-ground call triphones GMM-HMM acoustic model.
6. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that three sounds of air-ground call Plain GMM-HMM acoustic model includes: continuous speech acoustic model and keyword acoustic model, when speech to be identified after treatment It is recognized and converted into content of text output by continuous speech acoustic model, as controller and inconsistent pilot's Text Command Prompt alarm;It is detected whether by keyword acoustic model comprising preset sensitive information vocabulary, when recognizing sensitive information word Content of text output and prompt alarm are converted thereof into after remittance.
7. the method for recognizing speech according to claim 1 applied to earth-space communication, which is characterized in that general in maximum a posteriori Sef-adapting filter, modified gain function deviation are added in rate voice enhancement algorithm.
8. the method for recognizing speech according to claim 7 applied to earth-space communication, which is characterized in that sef-adapting filter Gain function be shown below:
Assuming that signal is that y (n)=x (n)+d (n) asks Fourier transformation (FFT) to obtain after framing plus Hamming window:
Y (k, τ)=x (k, τ)+D (k) (1)
Wherein, k is the frequency point of τ frame, and x (n) is clean speech signal, and d (n) is noise, and n is a certain moment;
Using the signal without words section as noise frame, the power for obtaining noise is δd, posteriori SNR is then calculated, is obtained:
The prior weight of first frame is calculate by the following formula to obtain:
Wherein, a is constant, and γ is posteriori SNR;
When signal proceeds to the second frame, the calculation formula of prior weight are as follows:
Wherein,For the pure voice signal estimated;δdIt (k) is noise power;&SNR (k, τ) is the signal-to-noise ratio estimated;
It brings formula (9) into formula (3) and (4), obtains improved prior weight are as follows:
Wherein, Gw(k, τ) is current time sef-adapting filter value;Gw(k, τ -1) is previous moment sef-adapting filter value.
CN201910213205.0A 2019-03-20 2019-03-20 Voice recognition method applied to ground-air communication Active CN110189746B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910213205.0A CN110189746B (en) 2019-03-20 2019-03-20 Voice recognition method applied to ground-air communication
PCT/CN2019/111789 WO2020186742A1 (en) 2019-03-20 2019-10-18 Voice recognition method applied to ground-air communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910213205.0A CN110189746B (en) 2019-03-20 2019-03-20 Voice recognition method applied to ground-air communication

Publications (2)

Publication Number Publication Date
CN110189746A true CN110189746A (en) 2019-08-30
CN110189746B CN110189746B (en) 2021-06-11

Family

ID=67713727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910213205.0A Active CN110189746B (en) 2019-03-20 2019-03-20 Voice recognition method applied to ground-air communication

Country Status (2)

Country Link
CN (1) CN110189746B (en)
WO (1) WO2020186742A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689906A (en) * 2019-11-05 2020-01-14 江苏网进科技股份有限公司 Law enforcement detection method and system based on voice processing technology
CN111667830A (en) * 2020-06-08 2020-09-15 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
WO2020186742A1 (en) * 2019-03-20 2020-09-24 成都天奥信息科技有限公司 Voice recognition method applied to ground-air communication
CN112309403A (en) * 2020-03-05 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN113129919A (en) * 2021-04-17 2021-07-16 上海麦图信息科技有限公司 Air control voice noise reduction method based on deep learning

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916565A (en) * 2010-06-24 2010-12-15 北京华安天诚科技有限公司 Voice recognition method and voice recognition device in air traffic control system
CN102074246B (en) * 2011-01-05 2012-12-19 瑞声声学科技(深圳)有限公司 Dual-microphone based speech enhancement device and method
US20160077523A1 (en) * 2013-07-22 2016-03-17 Sikorsky Aircraft Corporation System for controlling and communicating with aircraft
FR3010809B1 (en) * 2013-09-18 2017-05-19 Airbus Operations Sas METHOD AND DEVICE FOR AUTOMATIC MANAGEMENT ON BOARD AN AIRCRAFT AUDIO MESSAGE AIRCRAFT.
US20150162001A1 (en) * 2013-12-10 2015-06-11 Honeywell International Inc. System and method for textually and graphically presenting air traffic control voice information
CN106297796A (en) * 2016-03-25 2017-01-04 李克军 A kind of pilot rehearses monitoring method and device
CN106875948B (en) * 2017-02-22 2019-10-29 中国电子科技集团公司第二十八研究所 A kind of collision alert method based on control voice
CN108986791B (en) * 2018-08-10 2021-01-05 南京航空航天大学 Chinese and English language voice recognition method and system for civil aviation air-land communication field
CN109119072A (en) * 2018-09-28 2019-01-01 中国民航大学 Civil aviaton's land sky call acoustic model construction method based on DNN-HMM
CN109087657B (en) * 2018-10-17 2021-09-14 成都天奥信息科技有限公司 Voice enhancement method applied to ultra-short wave radio station
CN110189746B (en) * 2019-03-20 2021-06-11 成都天奥信息科技有限公司 Voice recognition method applied to ground-air communication

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186742A1 (en) * 2019-03-20 2020-09-24 成都天奥信息科技有限公司 Voice recognition method applied to ground-air communication
CN110689906A (en) * 2019-11-05 2020-01-14 江苏网进科技股份有限公司 Law enforcement detection method and system based on voice processing technology
CN112309403A (en) * 2020-03-05 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN111667830A (en) * 2020-06-08 2020-09-15 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
WO2021249284A1 (en) * 2020-06-08 2021-12-16 中国民航大学 Airport control decision support system and method based on semantic recognition of controller instruction
CN111667830B (en) * 2020-06-08 2022-04-29 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
CN113129919A (en) * 2021-04-17 2021-07-16 上海麦图信息科技有限公司 Air control voice noise reduction method based on deep learning

Also Published As

Publication number Publication date
CN110189746B (en) 2021-06-11
WO2020186742A1 (en) 2020-09-24

Similar Documents

Publication Publication Date Title
CN110189746A (en) A kind of method for recognizing speech applied to earth-space communication
Mitra et al. Medium-duration modulation cepstral feature for robust speech recognition
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
WO1998038632A1 (en) Method and system for establishing handset-dependent normalizing models for speaker recognition
CN106023986B (en) A kind of audio recognition method based on sound effect mode detection
CN107039035A (en) A kind of detection method of voice starting point and ending point
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
Venkatesan et al. Binaural classification-based speech segregation and robust speaker recognition system
Maganti et al. Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
Zhu et al. Log-energy dynamic range normalization for robust speech recognition
Wang et al. Robust speech recognition from ratio masks
Mehta et al. Robust front-end and back-end processing for feature extraction for Hindi speech recognition
Singh et al. A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters
Chen et al. Robust MFCCs derived from differentiated power spectrum
CN107039046B (en) Voice sound effect mode detection method based on feature fusion
Singh et al. A novel algorithm using MFCC and ERB gammatone filters in speech recognition
Shahrul Azmi et al. Noise robustness of Spectrum Delta (SpD) features in Malay vowel recognition
CN106448680B (en) A kind of missing data feature method for distinguishing speek person using perception auditory scene analysis
Morales et al. Adding noise to improve noise robustness in speech recognition.
Chandra Hindi vowel classification using QCN-PNCC features
Sailaja et al. Text independent speaker identification with finite multivariate generalized gaussian mixture model and hierarchical clustering algorithm
Seyedin et al. A new subband-weighted MVDR-based front-end for robust speech recognition
Fukuda et al. Phone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection.
Das et al. Integrating denoising autoencoder and vector Taylor series with auditory masking for speech recognition in noisy conditions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant