CN110415683A - A kind of air control voice instruction recognition method based on deep learning - Google Patents

A kind of air control voice instruction recognition method based on deep learning Download PDF

Info

Publication number
CN110415683A
CN110415683A CN201910619285.XA CN201910619285A CN110415683A CN 110415683 A CN110415683 A CN 110415683A CN 201910619285 A CN201910619285 A CN 201910619285A CN 110415683 A CN110415683 A CN 110415683A
Authority
CN
China
Prior art keywords
voice
data
deep learning
network model
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910619285.XA
Other languages
Chinese (zh)
Inventor
王耀彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Matu Information Technology Co Ltd
Original Assignee
Shanghai Matu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Matu Information Technology Co Ltd filed Critical Shanghai Matu Information Technology Co Ltd
Priority to CN201910619285.XA priority Critical patent/CN110415683A/en
Publication of CN110415683A publication Critical patent/CN110415683A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention discloses a kind of air control voice instruction recognition method based on deep learning, comprising the following steps: obtain voice signal to be identified, and be converted into the PCM audio data of 16bit 16kHz;Establish depth network model;Speech recognition engine is obtained using training data instruction depth network model;Phonetic segmentation is carried out to the audio data;Effective audio fragment that phonetic segmentation is obtained inputs speech recognition engine, output character recognition result.Wherein, depth network model uses convolution module as feature extractor, and is handled by reshape layers and full articulamentum the characteristic of extraction, carries out Sequence Learning using gating cycle unit, classification learning and decision are carried out eventually by full articulamentum, obtains prediction result.The present invention is used using artificial intelligence deep learning engine as core, has extremely strong professional applicability and accent generalization ability, data volume degree of dependence is lower a little, and universal phonetic identifying system is significantly better than in the identification of blank pipe voice.

Description

A kind of air control voice instruction recognition method based on deep learning
The present invention relates to voice processing technology field more particularly to a kind of languages in the air control field based on deep learning Voice recognition method.
Background technique
With civil aviation fast growing, all increase a large amount of aircraft and flight every year.However air control personnel are long-term There are notch, conservative estimation also has as many as thousands.Even if blank pipe relevant unit implements serial of methods, such as 4+1 to this The schemes such as cultivation mechanism, but blank pipe personnel still remain the phenomenon that being largely lost.Simultaneously again because of newly insufficient into personnel's experience, training The problems such as shortage of instruction time and resource, lead to not play corresponding personnel's benefit.Blank pipe tradesman's anxiety results in sky The problem of pipe personnel work overloadingly, causing air traffic, there are potential safety problem and efficiencies.The sky of China Middle traffic control is still the high-intensitive mental labour based on controller's subjective decision, and aircraft shift flourishes with civil aviaton And increase significantly, and understaffed blank pipe can only rely on controller at present and carry out absorbed intensive work high-intensitive for a long time, Human error is unavoidable.According to statistics, human error cause aviation accident accounted for the 80% of aviation accident total amount, at For the major reason for influencing aviation safety.By taking 1011 Hongqiao Airport passenger plane collision events in 2016 as an example, just because of tower Platform controller forgets aircraft dynamic, just causes such serious accident (runway intrusion).Therefore, it is necessary to introduce speech recognition system System sends instruction and reply voice with record controller and pilot in real time, to reduce situations such as understanding ambiguity and forgeing.
2016, Jing Zhun observation and control technology Co., Ltd, Guilin City carried out voice using controller's sound bank trained in advance Identification, this method are limited to existing speech database, for that cannot exactly match the voice messaging recognition effect of rule not Good, recognition accuracy is not high.2018, the constructed sound based on continuous Hidden Markov CHMM of science and technology group, China Electronics 15 It learns model to be used to identify voice, the recognition accuracy of this model is not as good as neural network model;Civil Aviation University of China uses feature The DNN-HMM model of enhancing further reduced error rate, but DNN is easy over-fitting, and is easily trapped into local optimum, Recognition accuracy is still not as good as CNN-GRU neural network model.
Summary of the invention
The present invention is directed in view of the above shortcomings of the prior art, construct a set of voice dedicated for air control instruction to know Other system.This system is constructed using depth learning technology, based on artificial intelligent voice identification engine and external information amendment system System, can identify a large amount of specialized vocabulary, special pronunciation and area name in blank pipe voice in high accuracy, realize higher blank pipe Speech recognition accuracy.
In order to realize that said effect, technical solution provided by the invention are as follows:
The present invention the following steps are included:
S1: obtaining voice signal to be identified, including at least one of real-Time Speech Signals and history voice signal, and by its Be converted to the PCM audio data of 16bit 16kHz;
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated;
S3: depth network model is established;
S4: speech recognition engine is obtained using training data training depth network model;
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
The phonetic segmentation comprises the following steps:
S2.1: input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains Spectrum sequence M (x) only retains the part of x=1 ~ 256;
S2.2: setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can adjust according to the actual situation, if M(x) > f note Record is 1, is then recorded as 0 less than -30dB, forms new sequence M0(x);
S2.3: setting voice threshold v=0.2, the size of voice threshold v can adjust according to the actual situation, sum to M0, if M0/ 256 > v then thinks that the frame is active frame, that is, there is voice;
S2.4: continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment.
The depth network model, using the identical convolution module of one or more structures as feature extraction Device, each convolution module include two convolutional layers and a pond layer, using reshape layers with full articulamentum to the feature of extraction Data are handled, and carry out Sequence Learning using gating cycle unit (), carry out classification learning using at least two full articulamentums With decision, prediction result is obtained.Voice data can obtain prediction knot by convolution module, gating cycle unit and full articulamentum Fruit realizes a complete forward-propagating process.
The depth network model is also provided with dropout layers in module junction, and gating cycle unit uses GRU Neural network includes positive Sequence Learning module and reverse sequence study module.
The air control voice instruction recognition method based on deep learning, using training data it is trained the depth Network model is spent to speech recognition engine, comprising the following specific steps
S4.1: it obtains raw empty pipe and commands audio data;
S4.2: mark voice data: S4.1 audio data obtained is labeled to obtain training data using text, is obtained Training data include voice data and labeled data;
S4.3: training data is divided per the group in voice data and labeled data in pairs;
S4.4: by back-propagation algorithm, being trained the depth network model that S3 is established using adadelta optimizer, and Form adaptable speech recognition deep learning network.
Audio data is inputted into speech recognition engine, trained identification engine will be converted into audio data text knot Fruit output.
Text results will be used to export, save or other application uses.
The invention has the following advantages:
The convolution layer model that depth network model of the present invention uses has the advantages that local sensing, weight are shared, can Data characteristics is effectively extracted under relatively small number of parameter amount;Pond layer has the advantages that feature invariance and Feature Dimension Reduction, energy Enough further compressed datas and parameter amount, prevent model over-fitting.Data pass through four convolution modules, until number is extracted in lower and Shangdi According to feature, learns recognition data and information, meet the information extraction principle of the mankind.
Secondly, gating cycle unit (GRU) wherein included is a kind of special circulation mind of simulation human mind system It through network (RNN), is made of update door and resetting door two parts, controls the memory of previous moment status information respectively and forgets journey Degree, to realize the study of language sequence, i.e., context is interrelated.The GRU of speech recognition engine of the present invention is respectively set The Sequence Learning of positive and negative both direction, understanding of the further assurance model to context.Data are in the Sequence Learning by GRU Afterwards, classification learning and decision are carried out into two layers of full articulamentum, obtains more accurate prediction result.In addition, module junction Dropout layers are additionally provided with, can prevent model from over-fitting occurs.
Since blank pipe is professional, regional differences and personnel's complexity, there are a large amount of professional terms, uniqueness in blank pipe voice Area name, Chinese and English mixes and accent difference, this is a huge challenge for speech recognition system.This is Construction in a systematic way is based on the speech recognition engine of artificial intelligence technology, the identification for blank pipe voice.Draw compared to traditional voice identification It holds up, not only recognition accuracy has the promotion (promoted amplitude about 30% ~ 60%) of matter to the speech recognition engine based on artificial intelligence, but also Model structure is substantially simplified, trained high with service efficiency.
The present invention is used using artificial intelligence deep learning engine as core, and realization is complete, specialized, specifically for sky The speech recognition system of pipe voice particularity.Compared to the universal phonetic identifying system of current major Internet enterprises, this system In speech recognition engine be all trained using true blank pipe voice, have extremely strong professional applicability and accent extensive Ability, scene height is specific, and data volume degree of dependence is lower, and universal phonetic identification system is significantly better than in the identification of blank pipe voice System.
Detailed description of the invention
Fig. 1 is the flow diagram of voice instruction recognition method of the present invention;
Fig. 2 is the connected mode schematic diagram of depth network model of the present invention;
Fig. 3 is the flow diagram of speech recognition engine training process.
Specific embodiment
Make clear and complete explanation to the embodiment of the present invention below in conjunction with the attached drawing in the present embodiment.The embodiment described Only a part of the embodiments of the present invention, instead of all the embodiments, based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.
As shown in Figure 1, a kind of air control voice instruction recognition method based on deep learning, specifically uses following step It is rapid:
S1: obtaining voice signal to be identified, is converted into the PCM audio data of 16bit 16kHz.
Voice signal to be identified will by real-time voice stream or by least one of history voice flow in the form of read It takes.History voice flow refers to that, by stored good audio file, the form for being converted into byte serial is read out, the lattice of byte stream Formula follows 16bit 16kHz PCM format.Real-time voice stream, which refers to, converts number for analog audio signal by equipment such as sound cards Word information, digital signal are similarly continuous byte serial 16bit 16kHz PCM format.
PCM, that is, pulse code modulation (Pulse Code Modulation), be converted to PCM there are two types of mode, it is a kind of It is that analog audio signal by sound card/audio collection card, is converted into digital byte string signal.Second, be by other audios Format is converted to PCM format.Here it is converted in Linux system using ffmpeg tool.Conversion method executes life It enables as follows:
ffmpeg -i “inputfile” -f wav -acodec pcm_s16le -ar 16000 "outputfile.wav";
Other kinds of audio file can be converted to the PCM audio data of 16bit 16kHz by mentioned order.
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated.
Concrete mode are as follows:
Input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains frequency spectrum Sequence M (x) only retains the part of x=1 ~ 256;Setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can be according to reality Situation adjustment, if M(x) > f is recorded as 1, be recorded as 0 less than -30dB, form new sequence M0(x);Voice threshold v is set The size of=0.2, voice threshold v can adjust according to the actual situation, sum to M0, think the frame for activity if M0/256 > v That is, there is voice in frame;;If continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment, The segment will be passed in speech recognition engine.
Preferably, the range of threshold value f is generally -40dB and arrives -10dB, depends primarily on noise intensity.F should be greater than noise Mean intensity.Since control voice noise is smaller, -30dB usually can effectively distinguish silence clip and have the piece of sound Section.And then realize cutting.When f value is too small, all audios all can be considered as the audio for having voice;And f value is excessive When, all audios all can be considered as the audio of no voice, be unable to complete cutting.
For threshold value v, value range is generally between 0.1 ~ 0.9.Its effect is beginning and the knot for judging audio fragment Beam.When its value is too small, any small audio fluctuation can all be taken as audio to start, even and if audio active and Stopped, but be also difficult be considered as segment end.;When current value is excessive, even if there is very strong audio active, It will not be considered audio to start, and activity intensity slightly reduces, will be considered audio fragment terminates.Therefore, it should choose Value appropriate, so that lesser audio active will not trigger audio fragment and start, and biggish activity, will not easily it stop Only.
S3: depth network model is established.
The frame of depth network model is as shown in Figure 2.Voice data is divided into audio and mark text two parts, sound intermediate frequency Part is passed to model as input data in the form of sound spectrograph, and mark word segment is converted to corresponding digital sequence according to specific dictionary Column are used as desired output valve.Input data sequentially passes through the identical convolution module of one or more structures first (CNN).Preferably, the present embodiment uses the identical convolution mould of 4 structures, and each convolution module includes two convolutional layers and one A pond layer.
Preferably, the embodiment of convolution module is as follows:
layer_h1 = Conv2D(32, (3, 3), use_bias=True, activation='relu', padding=' Same', kernel_initializer=' he_normal') (input_data) # convolutional layer
layer_h2 = Conv2D(32, (3, 3), use_bias=True, activation='relu', padding=' Same', kernel_initializer=' he_normal') (layer_h1) # convolutional layer
layer_h3 = MaxPooling2D(pool_size=2, strides=None, padding="valid") (layer_h2) pond # layer
After the feature extraction by convolution module, extracted characteristic carries out whole by reshape layers and full articulamentum Shape and synthesis subsequently enter gating cycle unit (GRU) and carry out Sequence Learning.
Preferably, reshape layers of embodiment is as follows:
Reshape layers of (layer_h12) # of layer_h13=Reshape ((200,3200))
GRU is a kind of special Recognition with Recurrent Neural Network (RNN) for simulating human mind system, by update door and resetting door two parts Composition controls the memory of previous moment status information and forgets degree, respectively to realize the study of language sequence, i.e. context It is interrelated.The GRU of speech recognition engine of the present invention is respectively provided with the Sequence Learning of positive and negative both direction, further ensures Understanding of the model to context.
Preferably, the embodiment of two-way GRU gating cycle unit is as follows:
layer_h15 = Bidirectional(GRU(256, return_sequences=True, return_state= False), merge_mode='concat')(layer_h14)
Data carry out classification learning and decision after the Sequence Learning by GRU, into two layers of full articulamentum, obtain prediction knot Fruit.In addition, dropout layers (abandoning layer) are provided in the module junction of model framework, to prevent model over-fitting.
layer_h6 = Dropout(0.1)(layer_h6)
Preferably, dropout layers can be respectively provided in the junction of each module of depth network model.
The training of numerous parameters needs backpropagation in model, and essence is the minimum process of model loss function, this The speech recognition engine of invention uses advanced CTC loss function, preferably:
The loss function of CTC is expressed as follows:
Wherein p(z | given input x x) is represented, the probability of output sequence z, S is training set.Loss function can be explained are as follows: give The product of the probability of correct label is exported after random sample sheet.Wherein p(z | it can x) be converted into following literary style.
Wherein x is given input, i.e., by neural network by audio frequency characteristics sequence transformation at symbolic feature sequence.y It is given output, the i.e. corresponding correct letter symbol sequence of audio.| z ' | indicate the length of sequence z.
For an appeal formula, it be output time is t that we, which define forward variable α (t, u), and output sequence z The sum of the forward direction probability in path follows following recurrence relation:
Wherein, u indicates that character position, lu ' indicate the label of position u.
First define a reversed variable β (t, u), it is meant that, it is all can reach the T moment export be space or right Answer the sum of the probability of " residue " path π ' of label.Here residual paths refer to the road other than α (t, u) is described Diameter.
In above-mentioned formula,Indicate t+1 moment, the probability of output label li '.
Specific implementation is then realized using the ctc loss function of keras, the method is as follows:
from keras import backend as K
def ctc_lambda_func(self, args):
y_pred, labels, input_length, label_length = args
y_pred = y_pred[:, :, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
The ctc_lambda_func function is that ctc loss function calculates function.Wherein y_pred is neural network meter Calculate as a result, labels indicate correct result, input_length indicate prediction batch length, label_length table Show the length of correct result batch.
S4: speech recognition engine is obtained using training data training depth network model.
Concrete mode are as follows:
It obtains raw empty pipe and commands audio data: obtaining the blank pipe voice for needing to identify.In addition to using true raw empty pipe to refer to Audio data is waved, artificial intelligence synthesis can also be used to intend true instruction voice to be made as raw empty pipe commander's audio data With;
Mark voice data: voice obtained is labeled to obtain training data using text, obtained training data packet Include voice data and labeled data;Can establishment officer carry out learning professional blank pipe knowledge, and blank pipe voice is labeled.It realizes Voice corresponds to the mark of the form of the text of corresponding language.For example (corresponding sound bite corresponds to corresponding marked content: " east Boat three nine-day periods after the winter solstice eight or eight rises to 900 holdings ");
Be divided per the group training data in pairs, it is preferable that it is one group that the size of division group, which is 10000 pairs, can be needed according to training by Data are divided into the data group with public energy, according to every group of the determination of the function of training data and concrete condition of specific number Amount;
By back-propagation algorithm, depth network model is trained using adadelta optimizer, obtains trained language Sound identifies engine.
The embodiment of preferably adadelta optimizer is as follows:
ada_d = Adadelta(lr = 0.01, rho = 0.95, epsilon = 1e-06)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer = ada_d)
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
Text region result will be used to export, save or other application uses.
The method of the embodiment of the present invention has used convolution layer model to have by carrying out structure optimization to deep neural network Have the advantages that local sensing, weight are shared, relative to the DNN connected entirely, can effectively be extracted under relatively small number of parameter amount Data characteristics reduces the probability of over-fitting;The study of language sequence not only may be implemented in two-way GRU model, also further protects Hinder understanding of the model to context;The present invention additionally uses CTC method, eliminates the inconvenience to its voice and text, improves The efficiency of post-processing.
This programme preferred embodiment is shown above.It should be pointed out that it should be understood by those skilled in the art that we Case is not restricted to the described embodiments, any those skilled in the art within the technical scope of the present disclosure, according to Technical solution of the present invention and inventive concept are equal or approximation is replaced or changed, and also should be regarded as protection scope of the present invention.

Claims (6)

1. a kind of air control voice instruction recognition method based on deep learning, which comprises the following steps:
S1: obtaining voice signal to be identified, is converted into the PCM audio data of 16bit 16kHz;
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated;
S3: depth network model is established;
S4: speech recognition engine is obtained using training data training depth network model;
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
2. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute Stating voice signal described in step S1 includes real-Time Speech Signals and/or history voice signal.
3. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that The phonetic segmentation that the step S2 takes includes the following steps:
S2.1: input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains Spectrum sequence M (x) only retains the part of x=1 ~ 256;
S2.2: setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can adjust according to the actual situation, if M(x) > f note Record is 1, is then recorded as 0 less than -30dB, forms new sequence M0(x);
S2.3: setting voice threshold v=0.2, the size of voice threshold v can adjust according to the actual situation, sum to M0, if M0/ 256 > v then thinks that the frame is active frame, that is, there is voice;
S2.4: continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment.
4. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute Depth network model described in step S3 is stated using the identical convolution module of one or more structures as feature extraction Device, each convolution module include two convolutional layers and a pond layer, using reshape layers with full articulamentum to the feature of extraction Data are handled, and carry out Sequence Learning using using the gating cycle unit of two-way GRU neural network, complete using at least two Articulamentum obtains output result.
5. the air control voice instruction recognition method according to claim 4 based on deep learning, which is characterized in that institute The module junction for stating depth network model is provided with dropout layers.
6. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute Step S4 is stated to specifically include:
S4.1: it obtains blank pipe and commands audio data: obtaining the blank pipe voice for needing to identify;
S4.2: the instruction for obtaining training data, obtaining mark voice data: is labeled to S4.1 voice obtained using text Practicing data includes voice data and labeled data;
S4.3: training data is divided per the group in pairs;
S4.4: by back-propagation algorithm, the depth network model that S3 is established is trained using Adadelta optimizer, is obtained To trained speech recognition engine.
CN201910619285.XA 2019-07-10 2019-07-10 A kind of air control voice instruction recognition method based on deep learning Pending CN110415683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910619285.XA CN110415683A (en) 2019-07-10 2019-07-10 A kind of air control voice instruction recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910619285.XA CN110415683A (en) 2019-07-10 2019-07-10 A kind of air control voice instruction recognition method based on deep learning

Publications (1)

Publication Number Publication Date
CN110415683A true CN110415683A (en) 2019-11-05

Family

ID=68360925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910619285.XA Pending CN110415683A (en) 2019-07-10 2019-07-10 A kind of air control voice instruction recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN110415683A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491371A (en) * 2019-08-07 2019-11-22 北京悠数智能科技有限公司 A kind of blank pipe instruction translation method for improving semantic information
CN110808036A (en) * 2019-11-07 2020-02-18 南京大学 Incremental voice command word recognition method
CN110930985A (en) * 2019-12-05 2020-03-27 携程计算机技术(上海)有限公司 Telephone speech recognition model, method, system, device and medium
CN110930995A (en) * 2019-11-26 2020-03-27 中国南方电网有限责任公司 Voice recognition model applied to power industry
CN111063336A (en) * 2019-12-30 2020-04-24 天津中科智能识别产业技术研究院有限公司 End-to-end voice recognition system based on deep learning
CN111312228A (en) * 2019-12-09 2020-06-19 中国南方电网有限责任公司 End-to-end-based voice navigation method applied to electric power enterprise customer service
CN111627257A (en) * 2020-04-13 2020-09-04 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN111667830A (en) * 2020-06-08 2020-09-15 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
CN112420024A (en) * 2020-10-23 2021-02-26 四川大学 Full-end-to-end Chinese and English mixed air traffic control voice recognition method and device
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN113409787A (en) * 2021-07-08 2021-09-17 上海民航华东空管工程技术有限公司 Civil aviation control voice recognition system based on artificial intelligence technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN103730118A (en) * 2012-10-11 2014-04-16 百度在线网络技术(北京)有限公司 Voice signal collecting method and mobile terminal
CN104715761A (en) * 2013-12-16 2015-06-17 深圳百科信息技术有限公司 Audio valid data detection methods and audio valid data detection system
CN107577662A (en) * 2017-08-08 2018-01-12 上海交通大学 Towards the semantic understanding system and method for Chinese text
CN108282262A (en) * 2018-04-16 2018-07-13 西安电子科技大学 Intelligent clock signal sorting technique based on gating cycle unit depth network
CN108986791A (en) * 2018-08-10 2018-12-11 南京航空航天大学 For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field
CN109766523A (en) * 2017-11-09 2019-05-17 普天信息技术有限公司 Part-of-speech tagging method and labeling system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device
CN103730118A (en) * 2012-10-11 2014-04-16 百度在线网络技术(北京)有限公司 Voice signal collecting method and mobile terminal
CN104715761A (en) * 2013-12-16 2015-06-17 深圳百科信息技术有限公司 Audio valid data detection methods and audio valid data detection system
CN107577662A (en) * 2017-08-08 2018-01-12 上海交通大学 Towards the semantic understanding system and method for Chinese text
CN109766523A (en) * 2017-11-09 2019-05-17 普天信息技术有限公司 Part-of-speech tagging method and labeling system
CN108282262A (en) * 2018-04-16 2018-07-13 西安电子科技大学 Intelligent clock signal sorting technique based on gating cycle unit depth network
CN108986791A (en) * 2018-08-10 2018-12-11 南京航空航天大学 For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王佳文: "面向民航陆空通话的语音识别技术研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491371A (en) * 2019-08-07 2019-11-22 北京悠数智能科技有限公司 A kind of blank pipe instruction translation method for improving semantic information
CN110808036A (en) * 2019-11-07 2020-02-18 南京大学 Incremental voice command word recognition method
CN110930995A (en) * 2019-11-26 2020-03-27 中国南方电网有限责任公司 Voice recognition model applied to power industry
CN110930985B (en) * 2019-12-05 2024-02-06 携程计算机技术(上海)有限公司 Telephone voice recognition model, method, system, equipment and medium
CN110930985A (en) * 2019-12-05 2020-03-27 携程计算机技术(上海)有限公司 Telephone speech recognition model, method, system, device and medium
CN111312228A (en) * 2019-12-09 2020-06-19 中国南方电网有限责任公司 End-to-end-based voice navigation method applied to electric power enterprise customer service
CN111063336A (en) * 2019-12-30 2020-04-24 天津中科智能识别产业技术研究院有限公司 End-to-end voice recognition system based on deep learning
CN111627257A (en) * 2020-04-13 2020-09-04 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN111627257B (en) * 2020-04-13 2022-05-03 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN111667830B (en) * 2020-06-08 2022-04-29 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
CN111667830A (en) * 2020-06-08 2020-09-15 中国民航大学 Airport control decision support system and method based on controller instruction semantic recognition
CN112420024A (en) * 2020-10-23 2021-02-26 四川大学 Full-end-to-end Chinese and English mixed air traffic control voice recognition method and device
CN112420024B (en) * 2020-10-23 2022-09-09 四川大学 Full-end-to-end Chinese and English mixed empty pipe voice recognition method and device
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN113409787A (en) * 2021-07-08 2021-09-17 上海民航华东空管工程技术有限公司 Civil aviation control voice recognition system based on artificial intelligence technology

Similar Documents

Publication Publication Date Title
CN110415683A (en) A kind of air control voice instruction recognition method based on deep learning
CN107239446B (en) A kind of intelligence relationship extracting method based on neural network Yu attention mechanism
CN109977234A (en) A kind of knowledge mapping complementing method based on subject key words filtering
CN110309503A (en) A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN
CN108986791A (en) For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field
CN105957518A (en) Mongolian large vocabulary continuous speech recognition method
CN111179917B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN110459208A (en) A kind of sequence of knowledge based migration is to sequential speech identification model training method
CN101645269A (en) Language recognition system and method
CN101650943A (en) Non-native speech recognition system and method thereof
CN113160798B (en) Chinese civil aviation air traffic control voice recognition method and system
CN111063336A (en) End-to-end voice recognition system based on deep learning
CN109949796A (en) A kind of end-to-end framework Lhasa dialect phonetic recognition methods based on Tibetan language component
CN115206293B (en) Multi-task air traffic control voice recognition method and device based on pre-training
CN111667830A (en) Airport control decision support system and method based on controller instruction semantic recognition
CN106548775A (en) A kind of audio recognition method and system
CN104751227A (en) Method and system for constructing deep neural network
CN110334243A (en) Audio representation learning method based on multilayer timing pond
CN111243591B (en) Air control voice recognition method introducing external data correction
CN105654947A (en) Method and system for acquiring traffic information in traffic broadcast speech
CN115240651A (en) Land-air communication speaker role identification method and device based on feature fusion
CN114944150A (en) Dual-task-based Conformer land-air communication acoustic model construction method
CN110232121B (en) Semantic network-based control instruction classification method
CN112133292A (en) End-to-end automatic voice recognition method for civil aviation land-air communication field
CN111090726A (en) NLP-based electric power industry character customer service interaction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191105

WD01 Invention patent application deemed withdrawn after publication