CN110415683A - A kind of air control voice instruction recognition method based on deep learning - Google Patents
A kind of air control voice instruction recognition method based on deep learning Download PDFInfo
- Publication number
- CN110415683A CN110415683A CN201910619285.XA CN201910619285A CN110415683A CN 110415683 A CN110415683 A CN 110415683A CN 201910619285 A CN201910619285 A CN 201910619285A CN 110415683 A CN110415683 A CN 110415683A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- deep learning
- network model
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The invention discloses a kind of air control voice instruction recognition method based on deep learning, comprising the following steps: obtain voice signal to be identified, and be converted into the PCM audio data of 16bit 16kHz;Establish depth network model;Speech recognition engine is obtained using training data instruction depth network model;Phonetic segmentation is carried out to the audio data;Effective audio fragment that phonetic segmentation is obtained inputs speech recognition engine, output character recognition result.Wherein, depth network model uses convolution module as feature extractor, and is handled by reshape layers and full articulamentum the characteristic of extraction, carries out Sequence Learning using gating cycle unit, classification learning and decision are carried out eventually by full articulamentum, obtains prediction result.The present invention is used using artificial intelligence deep learning engine as core, has extremely strong professional applicability and accent generalization ability, data volume degree of dependence is lower a little, and universal phonetic identifying system is significantly better than in the identification of blank pipe voice.
Description
The present invention relates to voice processing technology field more particularly to a kind of languages in the air control field based on deep learning
Voice recognition method.
Background technique
With civil aviation fast growing, all increase a large amount of aircraft and flight every year.However air control personnel are long-term
There are notch, conservative estimation also has as many as thousands.Even if blank pipe relevant unit implements serial of methods, such as 4+1 to this
The schemes such as cultivation mechanism, but blank pipe personnel still remain the phenomenon that being largely lost.Simultaneously again because of newly insufficient into personnel's experience, training
The problems such as shortage of instruction time and resource, lead to not play corresponding personnel's benefit.Blank pipe tradesman's anxiety results in sky
The problem of pipe personnel work overloadingly, causing air traffic, there are potential safety problem and efficiencies.The sky of China
Middle traffic control is still the high-intensitive mental labour based on controller's subjective decision, and aircraft shift flourishes with civil aviaton
And increase significantly, and understaffed blank pipe can only rely on controller at present and carry out absorbed intensive work high-intensitive for a long time,
Human error is unavoidable.According to statistics, human error cause aviation accident accounted for the 80% of aviation accident total amount, at
For the major reason for influencing aviation safety.By taking 1011 Hongqiao Airport passenger plane collision events in 2016 as an example, just because of tower
Platform controller forgets aircraft dynamic, just causes such serious accident (runway intrusion).Therefore, it is necessary to introduce speech recognition system
System sends instruction and reply voice with record controller and pilot in real time, to reduce situations such as understanding ambiguity and forgeing.
2016, Jing Zhun observation and control technology Co., Ltd, Guilin City carried out voice using controller's sound bank trained in advance
Identification, this method are limited to existing speech database, for that cannot exactly match the voice messaging recognition effect of rule not
Good, recognition accuracy is not high.2018, the constructed sound based on continuous Hidden Markov CHMM of science and technology group, China Electronics 15
It learns model to be used to identify voice, the recognition accuracy of this model is not as good as neural network model;Civil Aviation University of China uses feature
The DNN-HMM model of enhancing further reduced error rate, but DNN is easy over-fitting, and is easily trapped into local optimum,
Recognition accuracy is still not as good as CNN-GRU neural network model.
Summary of the invention
The present invention is directed in view of the above shortcomings of the prior art, construct a set of voice dedicated for air control instruction to know
Other system.This system is constructed using depth learning technology, based on artificial intelligent voice identification engine and external information amendment system
System, can identify a large amount of specialized vocabulary, special pronunciation and area name in blank pipe voice in high accuracy, realize higher blank pipe
Speech recognition accuracy.
In order to realize that said effect, technical solution provided by the invention are as follows:
The present invention the following steps are included:
S1: obtaining voice signal to be identified, including at least one of real-Time Speech Signals and history voice signal, and by its
Be converted to the PCM audio data of 16bit 16kHz;
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated;
S3: depth network model is established;
S4: speech recognition engine is obtained using training data training depth network model;
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
The phonetic segmentation comprises the following steps:
S2.1: input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains
Spectrum sequence M (x) only retains the part of x=1 ~ 256;
S2.2: setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can adjust according to the actual situation, if M(x) > f note
Record is 1, is then recorded as 0 less than -30dB, forms new sequence M0(x);
S2.3: setting voice threshold v=0.2, the size of voice threshold v can adjust according to the actual situation, sum to M0, if M0/
256 > v then thinks that the frame is active frame, that is, there is voice;
S2.4: continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment.
The depth network model, using the identical convolution module of one or more structures as feature extraction
Device, each convolution module include two convolutional layers and a pond layer, using reshape layers with full articulamentum to the feature of extraction
Data are handled, and carry out Sequence Learning using gating cycle unit (), carry out classification learning using at least two full articulamentums
With decision, prediction result is obtained.Voice data can obtain prediction knot by convolution module, gating cycle unit and full articulamentum
Fruit realizes a complete forward-propagating process.
The depth network model is also provided with dropout layers in module junction, and gating cycle unit uses GRU
Neural network includes positive Sequence Learning module and reverse sequence study module.
The air control voice instruction recognition method based on deep learning, using training data it is trained the depth
Network model is spent to speech recognition engine, comprising the following specific steps
S4.1: it obtains raw empty pipe and commands audio data;
S4.2: mark voice data: S4.1 audio data obtained is labeled to obtain training data using text, is obtained
Training data include voice data and labeled data;
S4.3: training data is divided per the group in voice data and labeled data in pairs;
S4.4: by back-propagation algorithm, being trained the depth network model that S3 is established using adadelta optimizer, and
Form adaptable speech recognition deep learning network.
Audio data is inputted into speech recognition engine, trained identification engine will be converted into audio data text knot
Fruit output.
Text results will be used to export, save or other application uses.
The invention has the following advantages:
The convolution layer model that depth network model of the present invention uses has the advantages that local sensing, weight are shared, can
Data characteristics is effectively extracted under relatively small number of parameter amount;Pond layer has the advantages that feature invariance and Feature Dimension Reduction, energy
Enough further compressed datas and parameter amount, prevent model over-fitting.Data pass through four convolution modules, until number is extracted in lower and Shangdi
According to feature, learns recognition data and information, meet the information extraction principle of the mankind.
Secondly, gating cycle unit (GRU) wherein included is a kind of special circulation mind of simulation human mind system
It through network (RNN), is made of update door and resetting door two parts, controls the memory of previous moment status information respectively and forgets journey
Degree, to realize the study of language sequence, i.e., context is interrelated.The GRU of speech recognition engine of the present invention is respectively set
The Sequence Learning of positive and negative both direction, understanding of the further assurance model to context.Data are in the Sequence Learning by GRU
Afterwards, classification learning and decision are carried out into two layers of full articulamentum, obtains more accurate prediction result.In addition, module junction
Dropout layers are additionally provided with, can prevent model from over-fitting occurs.
Since blank pipe is professional, regional differences and personnel's complexity, there are a large amount of professional terms, uniqueness in blank pipe voice
Area name, Chinese and English mixes and accent difference, this is a huge challenge for speech recognition system.This is
Construction in a systematic way is based on the speech recognition engine of artificial intelligence technology, the identification for blank pipe voice.Draw compared to traditional voice identification
It holds up, not only recognition accuracy has the promotion (promoted amplitude about 30% ~ 60%) of matter to the speech recognition engine based on artificial intelligence, but also
Model structure is substantially simplified, trained high with service efficiency.
The present invention is used using artificial intelligence deep learning engine as core, and realization is complete, specialized, specifically for sky
The speech recognition system of pipe voice particularity.Compared to the universal phonetic identifying system of current major Internet enterprises, this system
In speech recognition engine be all trained using true blank pipe voice, have extremely strong professional applicability and accent extensive
Ability, scene height is specific, and data volume degree of dependence is lower, and universal phonetic identification system is significantly better than in the identification of blank pipe voice
System.
Detailed description of the invention
Fig. 1 is the flow diagram of voice instruction recognition method of the present invention;
Fig. 2 is the connected mode schematic diagram of depth network model of the present invention;
Fig. 3 is the flow diagram of speech recognition engine training process.
Specific embodiment
Make clear and complete explanation to the embodiment of the present invention below in conjunction with the attached drawing in the present embodiment.The embodiment described
Only a part of the embodiments of the present invention, instead of all the embodiments, based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects
It encloses.
As shown in Figure 1, a kind of air control voice instruction recognition method based on deep learning, specifically uses following step
It is rapid:
S1: obtaining voice signal to be identified, is converted into the PCM audio data of 16bit 16kHz.
Voice signal to be identified will by real-time voice stream or by least one of history voice flow in the form of read
It takes.History voice flow refers to that, by stored good audio file, the form for being converted into byte serial is read out, the lattice of byte stream
Formula follows 16bit 16kHz PCM format.Real-time voice stream, which refers to, converts number for analog audio signal by equipment such as sound cards
Word information, digital signal are similarly continuous byte serial 16bit 16kHz PCM format.
PCM, that is, pulse code modulation (Pulse Code Modulation), be converted to PCM there are two types of mode, it is a kind of
It is that analog audio signal by sound card/audio collection card, is converted into digital byte string signal.Second, be by other audios
Format is converted to PCM format.Here it is converted in Linux system using ffmpeg tool.Conversion method executes life
It enables as follows:
ffmpeg -i “inputfile” -f wav -acodec pcm_s16le -ar 16000
"outputfile.wav";
Other kinds of audio file can be converted to the PCM audio data of 16bit 16kHz by mentioned order.
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated.
Concrete mode are as follows:
Input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains frequency spectrum
Sequence M (x) only retains the part of x=1 ~ 256;Setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can be according to reality
Situation adjustment, if M(x) > f is recorded as 1, be recorded as 0 less than -30dB, form new sequence M0(x);Voice threshold v is set
The size of=0.2, voice threshold v can adjust according to the actual situation, sum to M0, think the frame for activity if M0/256 > v
That is, there is voice in frame;;If continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment,
The segment will be passed in speech recognition engine.
Preferably, the range of threshold value f is generally -40dB and arrives -10dB, depends primarily on noise intensity.F should be greater than noise
Mean intensity.Since control voice noise is smaller, -30dB usually can effectively distinguish silence clip and have the piece of sound
Section.And then realize cutting.When f value is too small, all audios all can be considered as the audio for having voice;And f value is excessive
When, all audios all can be considered as the audio of no voice, be unable to complete cutting.
For threshold value v, value range is generally between 0.1 ~ 0.9.Its effect is beginning and the knot for judging audio fragment
Beam.When its value is too small, any small audio fluctuation can all be taken as audio to start, even and if audio active and
Stopped, but be also difficult be considered as segment end.;When current value is excessive, even if there is very strong audio active,
It will not be considered audio to start, and activity intensity slightly reduces, will be considered audio fragment terminates.Therefore, it should choose
Value appropriate, so that lesser audio active will not trigger audio fragment and start, and biggish activity, will not easily it stop
Only.
S3: depth network model is established.
The frame of depth network model is as shown in Figure 2.Voice data is divided into audio and mark text two parts, sound intermediate frequency
Part is passed to model as input data in the form of sound spectrograph, and mark word segment is converted to corresponding digital sequence according to specific dictionary
Column are used as desired output valve.Input data sequentially passes through the identical convolution module of one or more structures first
(CNN).Preferably, the present embodiment uses the identical convolution mould of 4 structures, and each convolution module includes two convolutional layers and one
A pond layer.
Preferably, the embodiment of convolution module is as follows:
layer_h1 = Conv2D(32, (3, 3), use_bias=True, activation='relu', padding='
Same', kernel_initializer=' he_normal') (input_data) # convolutional layer
layer_h2 = Conv2D(32, (3, 3), use_bias=True, activation='relu', padding='
Same', kernel_initializer=' he_normal') (layer_h1) # convolutional layer
layer_h3 = MaxPooling2D(pool_size=2, strides=None, padding="valid")
(layer_h2) pond # layer
After the feature extraction by convolution module, extracted characteristic carries out whole by reshape layers and full articulamentum
Shape and synthesis subsequently enter gating cycle unit (GRU) and carry out Sequence Learning.
Preferably, reshape layers of embodiment is as follows:
Reshape layers of (layer_h12) # of layer_h13=Reshape ((200,3200))
GRU is a kind of special Recognition with Recurrent Neural Network (RNN) for simulating human mind system, by update door and resetting door two parts
Composition controls the memory of previous moment status information and forgets degree, respectively to realize the study of language sequence, i.e. context
It is interrelated.The GRU of speech recognition engine of the present invention is respectively provided with the Sequence Learning of positive and negative both direction, further ensures
Understanding of the model to context.
Preferably, the embodiment of two-way GRU gating cycle unit is as follows:
layer_h15 = Bidirectional(GRU(256, return_sequences=True, return_state=
False), merge_mode='concat')(layer_h14)
Data carry out classification learning and decision after the Sequence Learning by GRU, into two layers of full articulamentum, obtain prediction knot
Fruit.In addition, dropout layers (abandoning layer) are provided in the module junction of model framework, to prevent model over-fitting.
layer_h6 = Dropout(0.1)(layer_h6)
Preferably, dropout layers can be respectively provided in the junction of each module of depth network model.
The training of numerous parameters needs backpropagation in model, and essence is the minimum process of model loss function, this
The speech recognition engine of invention uses advanced CTC loss function, preferably:
The loss function of CTC is expressed as follows:
Wherein p(z | given input x x) is represented, the probability of output sequence z, S is training set.Loss function can be explained are as follows: give
The product of the probability of correct label is exported after random sample sheet.Wherein p(z | it can x) be converted into following literary style.
Wherein x is given input, i.e., by neural network by audio frequency characteristics sequence transformation at symbolic feature sequence.y
It is given output, the i.e. corresponding correct letter symbol sequence of audio.| z ' | indicate the length of sequence z.
For an appeal formula, it be output time is t that we, which define forward variable α (t, u), and output sequence z
The sum of the forward direction probability in path follows following recurrence relation:
Wherein, u indicates that character position, lu ' indicate the label of position u.
First define a reversed variable β (t, u), it is meant that, it is all can reach the T moment export be space or right
Answer the sum of the probability of " residue " path π ' of label.Here residual paths refer to the road other than α (t, u) is described
Diameter.
In above-mentioned formula,Indicate t+1 moment, the probability of output label li '.
Specific implementation is then realized using the ctc loss function of keras, the method is as follows:
from keras import backend as K
def ctc_lambda_func(self, args):
y_pred, labels, input_length, label_length = args
y_pred = y_pred[:, :, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
The ctc_lambda_func function is that ctc loss function calculates function.Wherein y_pred is neural network meter
Calculate as a result, labels indicate correct result, input_length indicate prediction batch length, label_length table
Show the length of correct result batch.
S4: speech recognition engine is obtained using training data training depth network model.
Concrete mode are as follows:
It obtains raw empty pipe and commands audio data: obtaining the blank pipe voice for needing to identify.In addition to using true raw empty pipe to refer to
Audio data is waved, artificial intelligence synthesis can also be used to intend true instruction voice to be made as raw empty pipe commander's audio data
With;
Mark voice data: voice obtained is labeled to obtain training data using text, obtained training data packet
Include voice data and labeled data;Can establishment officer carry out learning professional blank pipe knowledge, and blank pipe voice is labeled.It realizes
Voice corresponds to the mark of the form of the text of corresponding language.For example (corresponding sound bite corresponds to corresponding marked content: " east
Boat three nine-day periods after the winter solstice eight or eight rises to 900 holdings ");
Be divided per the group training data in pairs, it is preferable that it is one group that the size of division group, which is 10000 pairs, can be needed according to training by
Data are divided into the data group with public energy, according to every group of the determination of the function of training data and concrete condition of specific number
Amount;
By back-propagation algorithm, depth network model is trained using adadelta optimizer, obtains trained language
Sound identifies engine.
The embodiment of preferably adadelta optimizer is as follows:
ada_d = Adadelta(lr = 0.01, rho = 0.95, epsilon = 1e-06)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer =
ada_d)
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
Text region result will be used to export, save or other application uses.
The method of the embodiment of the present invention has used convolution layer model to have by carrying out structure optimization to deep neural network
Have the advantages that local sensing, weight are shared, relative to the DNN connected entirely, can effectively be extracted under relatively small number of parameter amount
Data characteristics reduces the probability of over-fitting;The study of language sequence not only may be implemented in two-way GRU model, also further protects
Hinder understanding of the model to context;The present invention additionally uses CTC method, eliminates the inconvenience to its voice and text, improves
The efficiency of post-processing.
This programme preferred embodiment is shown above.It should be pointed out that it should be understood by those skilled in the art that we
Case is not restricted to the described embodiments, any those skilled in the art within the technical scope of the present disclosure, according to
Technical solution of the present invention and inventive concept are equal or approximation is replaced or changed, and also should be regarded as protection scope of the present invention.
Claims (6)
1. a kind of air control voice instruction recognition method based on deep learning, which comprises the following steps:
S1: obtaining voice signal to be identified, is converted into the PCM audio data of 16bit 16kHz;
S2: carrying out phonetic segmentation to the audio data, effective audio fragment that obtains that treated;
S3: depth network model is established;
S4: speech recognition engine is obtained using training data training depth network model;
S5: effective audio fragment is inputted into speech recognition engine, and output character recognition result.
2. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute
Stating voice signal described in step S1 includes real-Time Speech Signals and/or history voice signal.
3. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that
The phonetic segmentation that the step S2 takes includes the following steps:
S2.1: input audio data carries out Fast Fourier Transform (FFT) to audio frame (using 1024 sampled points for a frame), obtains
Spectrum sequence M (x) only retains the part of x=1 ~ 256;
S2.2: setting adjusts threshold value f=- 30dB, and the size for adjusting threshold value f can adjust according to the actual situation, if M(x) > f note
Record is 1, is then recorded as 0 less than -30dB, forms new sequence M0(x);
S2.3: setting voice threshold v=0.2, the size of voice threshold v can adjust according to the actual situation, sum to M0, if M0/
256 > v then thinks that the frame is active frame, that is, there is voice;
S2.4: continuously active frame is more than 8 frames, then it is assumed that the audio of continuously active frame is effective audio fragment.
4. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute
Depth network model described in step S3 is stated using the identical convolution module of one or more structures as feature extraction
Device, each convolution module include two convolutional layers and a pond layer, using reshape layers with full articulamentum to the feature of extraction
Data are handled, and carry out Sequence Learning using using the gating cycle unit of two-way GRU neural network, complete using at least two
Articulamentum obtains output result.
5. the air control voice instruction recognition method according to claim 4 based on deep learning, which is characterized in that institute
The module junction for stating depth network model is provided with dropout layers.
6. the air control voice instruction recognition method according to claim 1 based on deep learning, which is characterized in that institute
Step S4 is stated to specifically include:
S4.1: it obtains blank pipe and commands audio data: obtaining the blank pipe voice for needing to identify;
S4.2: the instruction for obtaining training data, obtaining mark voice data: is labeled to S4.1 voice obtained using text
Practicing data includes voice data and labeled data;
S4.3: training data is divided per the group in pairs;
S4.4: by back-propagation algorithm, the depth network model that S3 is established is trained using Adadelta optimizer, is obtained
To trained speech recognition engine.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910619285.XA CN110415683A (en) | 2019-07-10 | 2019-07-10 | A kind of air control voice instruction recognition method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910619285.XA CN110415683A (en) | 2019-07-10 | 2019-07-10 | A kind of air control voice instruction recognition method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110415683A true CN110415683A (en) | 2019-11-05 |
Family
ID=68360925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910619285.XA Pending CN110415683A (en) | 2019-07-10 | 2019-07-10 | A kind of air control voice instruction recognition method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110415683A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110491371A (en) * | 2019-08-07 | 2019-11-22 | 北京悠数智能科技有限公司 | A kind of blank pipe instruction translation method for improving semantic information |
CN110808036A (en) * | 2019-11-07 | 2020-02-18 | 南京大学 | Incremental voice command word recognition method |
CN110930985A (en) * | 2019-12-05 | 2020-03-27 | 携程计算机技术(上海)有限公司 | Telephone speech recognition model, method, system, device and medium |
CN110930995A (en) * | 2019-11-26 | 2020-03-27 | 中国南方电网有限责任公司 | Voice recognition model applied to power industry |
CN111063336A (en) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | End-to-end voice recognition system based on deep learning |
CN111312228A (en) * | 2019-12-09 | 2020-06-19 | 中国南方电网有限责任公司 | End-to-end-based voice navigation method applied to electric power enterprise customer service |
CN111627257A (en) * | 2020-04-13 | 2020-09-04 | 南京航空航天大学 | Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment |
CN111667830A (en) * | 2020-06-08 | 2020-09-15 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
CN112420024A (en) * | 2020-10-23 | 2021-02-26 | 四川大学 | Full-end-to-end Chinese and English mixed air traffic control voice recognition method and device |
CN112508023A (en) * | 2020-10-27 | 2021-03-16 | 重庆大学 | Deep learning-based end-to-end identification method for code-spraying characters of parts |
CN113409787A (en) * | 2021-07-08 | 2021-09-17 | 上海民航华东空管工程技术有限公司 | Civil aviation control voice recognition system based on artificial intelligence technology |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN103730118A (en) * | 2012-10-11 | 2014-04-16 | 百度在线网络技术(北京)有限公司 | Voice signal collecting method and mobile terminal |
CN104715761A (en) * | 2013-12-16 | 2015-06-17 | 深圳百科信息技术有限公司 | Audio valid data detection methods and audio valid data detection system |
CN107577662A (en) * | 2017-08-08 | 2018-01-12 | 上海交通大学 | Towards the semantic understanding system and method for Chinese text |
CN108282262A (en) * | 2018-04-16 | 2018-07-13 | 西安电子科技大学 | Intelligent clock signal sorting technique based on gating cycle unit depth network |
CN108986791A (en) * | 2018-08-10 | 2018-12-11 | 南京航空航天大学 | For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field |
CN109766523A (en) * | 2017-11-09 | 2019-05-17 | 普天信息技术有限公司 | Part-of-speech tagging method and labeling system |
-
2019
- 2019-07-10 CN CN201910619285.XA patent/CN110415683A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN103730118A (en) * | 2012-10-11 | 2014-04-16 | 百度在线网络技术(北京)有限公司 | Voice signal collecting method and mobile terminal |
CN104715761A (en) * | 2013-12-16 | 2015-06-17 | 深圳百科信息技术有限公司 | Audio valid data detection methods and audio valid data detection system |
CN107577662A (en) * | 2017-08-08 | 2018-01-12 | 上海交通大学 | Towards the semantic understanding system and method for Chinese text |
CN109766523A (en) * | 2017-11-09 | 2019-05-17 | 普天信息技术有限公司 | Part-of-speech tagging method and labeling system |
CN108282262A (en) * | 2018-04-16 | 2018-07-13 | 西安电子科技大学 | Intelligent clock signal sorting technique based on gating cycle unit depth network |
CN108986791A (en) * | 2018-08-10 | 2018-12-11 | 南京航空航天大学 | For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field |
Non-Patent Citations (1)
Title |
---|
王佳文: "面向民航陆空通话的语音识别技术研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110491371A (en) * | 2019-08-07 | 2019-11-22 | 北京悠数智能科技有限公司 | A kind of blank pipe instruction translation method for improving semantic information |
CN110808036A (en) * | 2019-11-07 | 2020-02-18 | 南京大学 | Incremental voice command word recognition method |
CN110930995A (en) * | 2019-11-26 | 2020-03-27 | 中国南方电网有限责任公司 | Voice recognition model applied to power industry |
CN110930985B (en) * | 2019-12-05 | 2024-02-06 | 携程计算机技术(上海)有限公司 | Telephone voice recognition model, method, system, equipment and medium |
CN110930985A (en) * | 2019-12-05 | 2020-03-27 | 携程计算机技术(上海)有限公司 | Telephone speech recognition model, method, system, device and medium |
CN111312228A (en) * | 2019-12-09 | 2020-06-19 | 中国南方电网有限责任公司 | End-to-end-based voice navigation method applied to electric power enterprise customer service |
CN111063336A (en) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | End-to-end voice recognition system based on deep learning |
CN111627257A (en) * | 2020-04-13 | 2020-09-04 | 南京航空航天大学 | Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment |
CN111627257B (en) * | 2020-04-13 | 2022-05-03 | 南京航空航天大学 | Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment |
CN111667830B (en) * | 2020-06-08 | 2022-04-29 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
CN111667830A (en) * | 2020-06-08 | 2020-09-15 | 中国民航大学 | Airport control decision support system and method based on controller instruction semantic recognition |
CN112420024A (en) * | 2020-10-23 | 2021-02-26 | 四川大学 | Full-end-to-end Chinese and English mixed air traffic control voice recognition method and device |
CN112420024B (en) * | 2020-10-23 | 2022-09-09 | 四川大学 | Full-end-to-end Chinese and English mixed empty pipe voice recognition method and device |
CN112508023A (en) * | 2020-10-27 | 2021-03-16 | 重庆大学 | Deep learning-based end-to-end identification method for code-spraying characters of parts |
CN113409787A (en) * | 2021-07-08 | 2021-09-17 | 上海民航华东空管工程技术有限公司 | Civil aviation control voice recognition system based on artificial intelligence technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110415683A (en) | A kind of air control voice instruction recognition method based on deep learning | |
CN107239446B (en) | A kind of intelligence relationship extracting method based on neural network Yu attention mechanism | |
CN109977234A (en) | A kind of knowledge mapping complementing method based on subject key words filtering | |
CN110309503A (en) | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN | |
CN108986791A (en) | For the Chinese and English languages audio recognition method and system in civil aviaton's land sky call field | |
CN105957518A (en) | Mongolian large vocabulary continuous speech recognition method | |
CN111179917B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
CN110459208A (en) | A kind of sequence of knowledge based migration is to sequential speech identification model training method | |
CN101645269A (en) | Language recognition system and method | |
CN101650943A (en) | Non-native speech recognition system and method thereof | |
CN113160798B (en) | Chinese civil aviation air traffic control voice recognition method and system | |
CN111063336A (en) | End-to-end voice recognition system based on deep learning | |
CN109949796A (en) | A kind of end-to-end framework Lhasa dialect phonetic recognition methods based on Tibetan language component | |
CN115206293B (en) | Multi-task air traffic control voice recognition method and device based on pre-training | |
CN111667830A (en) | Airport control decision support system and method based on controller instruction semantic recognition | |
CN106548775A (en) | A kind of audio recognition method and system | |
CN104751227A (en) | Method and system for constructing deep neural network | |
CN110334243A (en) | Audio representation learning method based on multilayer timing pond | |
CN111243591B (en) | Air control voice recognition method introducing external data correction | |
CN105654947A (en) | Method and system for acquiring traffic information in traffic broadcast speech | |
CN115240651A (en) | Land-air communication speaker role identification method and device based on feature fusion | |
CN114944150A (en) | Dual-task-based Conformer land-air communication acoustic model construction method | |
CN110232121B (en) | Semantic network-based control instruction classification method | |
CN112133292A (en) | End-to-end automatic voice recognition method for civil aviation land-air communication field | |
CN111090726A (en) | NLP-based electric power industry character customer service interaction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191105 |
|
WD01 | Invention patent application deemed withdrawn after publication |