CN114512131A - Airborne air traffic control instruction intelligent voice recognition method and system - Google Patents
Airborne air traffic control instruction intelligent voice recognition method and system Download PDFInfo
- Publication number
- CN114512131A CN114512131A CN202111518350.3A CN202111518350A CN114512131A CN 114512131 A CN114512131 A CN 114512131A CN 202111518350 A CN202111518350 A CN 202111518350A CN 114512131 A CN114512131 A CN 114512131A
- Authority
- CN
- China
- Prior art keywords
- voice
- traffic control
- air traffic
- model
- pilot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000008569 process Effects 0.000 claims abstract description 43
- 238000004891 communication Methods 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims description 44
- 238000002372 labelling Methods 0.000 claims description 16
- 230000007246 mechanism Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 9
- 238000013136 deep learning model Methods 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 238000003786 synthesis reaction Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 2
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 238000012360 testing method Methods 0.000 description 12
- 230000003993 interaction Effects 0.000 description 10
- 238000005070 sampling Methods 0.000 description 10
- 238000013135 deep learning Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000007726 management method Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The invention provides an airborne air traffic control instruction intelligent voice recognition method and system, relates to the technical field of air traffic control communication processing, can realize accurate recognition of voice in an air traffic control process, solves the problem of low voice recognition rate in the existing air traffic control field, and creates conditions for really lightening the workload of pilots; the method comprises the following steps: an air traffic control voice processing process and a pilot voice processing process; in the two processes, pinyin and phonemes in the voice data are extracted through the voice model to obtain a voice labeled text, and then keywords are extracted from the voice labeled text through the language model to obtain a keyword text; synthesizing the keyword text into voice in the air traffic control voice processing process and transmitting the voice to the pilot; the pilot speech processing process triggers the communication device, the navigation device and/or the checklist device to respond according to the keyword text. The technical scheme provided by the invention is suitable for the process of empty pipe of the airplane.
Description
Technical Field
The invention relates to the crossing field of an air traffic control communication processing technology and an artificial intelligence technology, in particular to an airborne air traffic control instruction intelligent voice recognition method and system.
Background
After the invention of the computer, the machine can 'understand' human language, understand the inherent meaning in the language, and can make correct answer, which is the ultimate goal of people pursuing artificial intelligence. This process involves 3 techniques: automatic Speech Recognition (ASR); natural Language Processing (NLP) for the purpose of enabling a machine to understand human intentions; speech Synthesis (SS), the purpose of which is to let a machine speak. The automatic language recognition aims to enable a machine to understand human voices, is a typical interdisciplinary task and relates to multiple disciplines such as pattern recognition, signal processing, physical acoustics, physiology, psychology, computer science and linguistics. In order to enable a computer to better process human voice, it is necessary to encode and convert the collected human original voice data into voice audio data that can be recognized and edited by the computer. The quality of the speech data acquisition and editing determines the upper limit of accuracy of the overall speech recognition.
Currently, there are three ways in which flight crews can obtain instructions. The first is the flight plan, i.e. written instructions, which are repeated by the air traffic management department; secondly, establishing an electronic instruction transmitted by an air-ground data communication link between the land and the air by using an aviation radio station; and thirdly, voice communication, i.e., voice commands, through the airline radio station. Under the existing technical level and equipment conditions, voice communication is still the most convenient communication means for ground and flight crew, and is the character information of an air-ground data communication link. Therefore, the existing communication mode is divided into division and division, and voice communication transmits an instruction which is directly related to flight safety and is immediate, important or needs direct response of an opposite party; other means, such as paper or air-to-ground data communication link communication means, are used to communicate non-flight safety directly related information, such as flight plans, flight intelligence, and airline operational needs.
Because the current air-ground communication technology still mainly uses radio calling, the radio calling is easily interfered by other equipment and even weather change, local accents and expression abilities of communication personnel are different, and particularly under the condition of Chinese and English mixing, the air traffic pressure of the airport which is busy in the modern day is increased, and an important factor of aviation accidents is caused when understanding errors caused by voice communication are consistent.
Accordingly, there is a need to develop an airborne air traffic control command intelligent speech recognition method and system to address the deficiencies of the prior art and to solve or mitigate one or more of the problems set forth above.
Disclosure of Invention
In view of the above, the invention provides an airborne air traffic control instruction intelligent voice recognition method and system, which can realize accurate voice recognition in the air traffic control process, solve the problem of low voice recognition rate in the existing air traffic control field, and create conditions for really lightening the workload of pilots.
In one aspect, the invention provides an intelligent voice recognition method for airborne air traffic control instructions, which comprises the following steps: a hollow pipe end voice processing process and a pilot voice processing process;
the hollow pipe end voice processing process comprises the following steps:
s11, collecting the air traffic control voice data of the air traffic control personnel;
s12, processing the air traffic control voice data by using the trained voice model to obtain an air traffic control voice labeling text;
s13, processing the air traffic control voice labeling text by using the trained language model, extracting a plurality of air traffic control instruction keywords and combining the keywords into an air traffic control instruction keyword text;
s14, carrying out voice synthesis on the empty pipe instruction keyword text, transmitting the empty pipe instruction keyword text to a pilot for answering, and transmitting the empty pipe instruction keyword text to airborne terminal display equipment for displaying;
the pilot voice processing process comprises the following steps:
s21, acquiring pilot voice data of a pilot;
s22, processing the pilot voice data by using the trained voice model to obtain a pilot voice labeling text;
s23, processing the pilot voice labeling text by using the trained language model, identifying a plurality of pilot instruction keywords and combining the pilot instruction keywords into a pilot instruction keyword text;
and S24, triggering a communication device, a navigation device and/or an inspection sheet device to respond according to the content of the pilot instruction keyword text, and completing a communication function with the aerial terminal, an airplane body action adjusting function and/or a project inspection function.
As to the above-mentioned aspect and any possible implementation manner, further providing an implementation manner, the training contents of the speech model in step S12 and step S22 include: acquiring historical voice data, marking the historical voice data according to aviation professional terms and the tone of each word to obtain a voice marking text for model training, and inputting the voice marking text into an original model for training to obtain the trained voice model;
the original model is a neural network model based on the combination of a spectrogram image convolution neural network and a connection time sequence classification model, time and frequency are used as two dimensions of the image, and original modeling is achieved through the combination of more convolution layers and pooling layers.
The above-mentioned aspect and any possible implementation further provide an implementation, where the empty pipe voice annotation text in step S12 and the pilot voice annotation text in step S22 each include: the Chinese word segmentation linguistic data text after the aviation professional term is revised, the Chinese pinyin text with tone marks and the minimum phonetic unit sequence for decomposing the Chinese pinyin into consonant and vowel factors.
The above-described aspect and any possible implementation further provide an implementation that the language model in steps S13 and S23 is a deep learning model based on a self-attention mechanism.
The above-described aspect and any possible implementation further provide an implementation that the empty pipe voice data in step S11 and the pilot voice data in step S21 are audio files or voice spectrograms.
The above-described aspects and any possible implementations further provide an implementation, and the method further includes: and expanding a flight voice corpus according to the air traffic control voice data and the air traffic control voice labeling text in the air traffic control voice processing process and the pilot voice data and the pilot voice labeling text in the pilot voice processing process.
The above-mentioned aspects and any possible implementation further provide an implementation, and the specific content of expanding the flying speech corpus includes: traversing the flight voice corpus by taking the data to be judged as a template, judging whether the flight voice corpus has the same data or not, and if so, terminating the expansion; otherwise, the data to be judged is supplemented into the flight voice corpus, and the step S12 or the step S22 is carried out.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, in steps S12 and S22, after processing the voice data, updating the parameter configuration file of the corresponding voice model;
in steps S13 and S23, the phonetic notation text is processed and the parameter configuration file of the corresponding language model is updated.
The above-mentioned aspects and any possible implementation manner further provide an implementation manner, and the parameter configuration files of the onboard end speech model and the language model and the parameter configuration files of the air traffic control end speech model and the language model are synchronized in real time, so that the onboard end model is automatically updated.
In another aspect, the present invention provides an airborne air traffic control instruction intelligent speech recognition system, characterized in that said system is capable of implementing the steps of any of the methods described above;
the system comprises:
an on-board terminal, comprising: the system comprises a voice acquisition processing module, a voice model module, a language model module, a check list system, a navigation response system and a communication response system; the voice acquisition processing module, the voice model module and the language model module are sequentially connected, and the language model is respectively connected with the examination order system, the navigation response system and the communication response system through interfaces;
a hollow pipe end, comprising: the system comprises a voice acquisition processing module, a voice model module and a language model module; the voice acquisition processing module, the voice model module and the language model module are connected in sequence;
the airborne end is in communication connection with the empty pipe end;
wherein, the speech model module and the language model module refer to modules carrying corresponding models.
Compared with the prior art, one of the technical schemes has the following advantages or beneficial effects: the method carries out deep fusion between a deep learning training frame, a speech signal processing model based on a professional term dictionary in the field of aviation, a context semantic model based on self attention, an airborne terminal and ground terminal deployment device, provides double human-in-loop interaction modes between air traffic control and air ground between pilots and between the pilots and the airborne device, and carries out model deployment and software application test on a ground server;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the unified standard is adopted to carry out data normalization on voice audio samples with different formats and different voice characteristics, so that the running efficiency of a deep learning model is greatly improved, and the intelligent reasoning module can be deployed under the hardware with limited performance of onboard computer equipment;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the serial channel design of the voice processing model and the language processing model can update the weight parameters of the lightweight model in real time according to the voice command of the air traffic control-pilot, so that the voice recognition accuracy is further improved, and the practicability of the whole voice recognition device is also improved;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the work frame of the pilot-in-loop voice control checklist and the air traffic control-pilot-in-loop voice control communication is beneficial to the follow-up development of user feedback self-adaptive reinforcement learning based on checklist/flight voice knowledge map enhancement, and the follow-up access to knowledge bases such as flight voice knowledge maps, so that a more personalized checklist query voice interaction mode and an air traffic control pilot voice interaction mode are formed.
Of course, it is not necessary for any one product in which the invention is practiced to achieve all of the above-described technical effects simultaneously.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is an exemplary illustration of an empty pipe speech text annotation provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a technical solution of a ground server speech training model according to an embodiment of the present invention;
FIG. 3 is a flow chart of dynamic adaptive update of an empty-managed speech instruction-driven speech recognition deployment model according to an embodiment of the present invention;
FIG. 4 is a process of dynamic adaptive update of a pilot speech command driven speech recognition deployment model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a speech recognition model inference technique according to an embodiment of the present invention;
FIG. 6 is a process for professional audio processing software editing raw audio provided by one embodiment of the present invention;
FIG. 7 is an example of a voice information panel displayed by specialized voice processing software provided by one embodiment of the present invention;
FIG. 8 is the phonetic annotation content of the phonetic audio A38_1 provided by one embodiment of the present invention;
fig. 9 is a speech spectrum diagram of a38_1 according to an embodiment of the present invention;
FIG. 10 is a block diagram of a full-sequence convolutional neural network joint acoustic modeling approach provided by an embodiment of the present invention;
FIG. 11 is a diagram illustrating a phonetic transcription sequence of the audio Pinyin provided by an embodiment of the present invention;
FIG. 12 is a schematic diagram of convergence of empty pipe speech and standard speech library data fusion training provided by an embodiment of the present invention;
FIG. 13 is a diagram illustrating fusion of a knowledge base of phonetic annotations provided by an embodiment of the present invention;
FIG. 14 is a flow chart of a self-attention mechanism provided by one embodiment of the present invention;
FIG. 15 is a schematic diagram of a semantic visualization based on a self-attention mechanism provided by an embodiment of the invention;
FIG. 16 illustrates the results of a preliminary test of reasoning performed by the speech recognition model according to an embodiment of the present invention;
FIG. 17 is a diagram of the final test effect of the speech recognition model inference provided by one embodiment of the present invention (10-segment empty pipe speech recognition test).
Detailed Description
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and the device have the advantages that a bidirectional data link and the like in airborne equipment are used as an information transmission mode, an airborne flight management computer, an airborne head-up display of an airplane and an electronic flight bag are used as a display unit hardware carrier, air traffic control voice information and short message information are accurately displayed on the display unit in a semantic understanding mode, interaction information between the pilot and equipment in the cabin and air traffic control personnel is assisted to be accurately understood by the pilot, the workload of the pilot is further reduced, and meanwhile, the flight safety is improved.
According to the method, large-scale pilot and air traffic control voice data are collected, a deep learning algorithm is utilized to carry out combined training on voice models and language models on the data, a small-scale weight model file is formed and is deployed on a flight management computer or a portable mobile terminal, and continuous voice recognition is carried out on a control instruction received from a pilot microphone and an air traffic control voice command received from a CPDLC (Controller-pilot data link communications, abbreviated as CPDLC) link in the form of resident flight voice assistant application software. The identification method of the invention is mainly divided into three steps: the method comprises an empty management voice data regularization sampling stage, a voice recognition model training stage and a voice recognition model reasoning stage.
Firstly, an empty pipe voice data regularization sampling stage:
in order to fully utilize the limited computing power of the on-board computer equipment, the collection and editing of air traffic control voice data are carried out by unified voice data processing according to a standard corpus small-sized voice database, and the air traffic control voice audio original data (two-channel stereo, 8000Hz/22.05kHz sampling rate, wav format) acquired from a microphone are automatically unfolded with voice marks, so that the annotated voice corpus can be delivered in an extensible, rapid, high-throughput and low-cost manner. In order to truly add the acquired voice data into the voice database of the standard corpus for unified training, voice text labels are added to the air traffic control voice, and some aviation field professional terms of the air traffic control voice are specified through the labels. In addition, the main feature of the blank pipe speech is that the Chinese and English are mixed, so in order to avoid ambiguity or misunderstanding of the blank communication, a phonetic approach method is usually adopted to convert the pronunciation of English letters into corresponding pinyin of Chinese English for labeling, and each phoneme of the speech text label is labeled by combining pinyin and tone (1, 2, 3, 4), as shown in fig. 1 below. The regularization sampling is mainly aimed at a model training process, collected historical voice data are regularized and sampled to obtain a training data set meeting requirements of voice model training, the trained voice model is used for processing voice data of an air traffic control or a pilot in the normal use process of the model, data such as pinyin and phoneme are extracted, and a labeling text is obtained.
II, training a voice recognition model:
the method includes the steps that a plurality of servers with deep learning frames are used on the ground to train large-scale pilot voice data and air traffic control voice data, weather messages and information such as instruction keywords issued by pilots are recognized and extracted by a model from air traffic control voice audio, retrieval keywords of navigation and communication systems and check items are recognized from the voice audio of the pilots, in the process, voice recognition is adaptive to the effects of accents of the air traffic control and the pilots and professional terms in the aviation field, voice models and language models with good robustness are obtained through training, and the trained lightweight models are deployed at a service end and an airborne end at the same time. The training phase of the speech model deployed at the ground air traffic control server is shown in the following fig. 2.
When the air traffic control personnel and the pilot send out related voice instructions, the ground server equipment receives the processed regular voice data, dynamically adjusts the lightweight HDF5 weight file of the training model, and respectively updates the weight of the airborne deployment model and the weight of the ground deployment model in real time through an air-ground transmission mode and a ground transmission mode, so that the voice recognition deployment model carries out self-adaptive dynamic adjustment according to the related voice instructions, and the recognition result obtained by inference of the deployable model is more accurate. Fig. 3 shows a dynamic adaptive update process of the speech recognition deployment model driven by the blank pipe speech instruction, and similarly, fig. 4 shows a dynamic adaptive update process of the speech recognition deployment model driven by the pilot speech instruction.
Thirdly, in a speech recognition model reasoning phase:
the model files reside in the computing device, such as a flight management computer or electronic flight bag, in the form of the flight voice assistant's application software and communicate with the pilot's headset, communication system, navigation system, and interfaces such as the CPDLC link. The flight voice assistant application software receives real-time voice information of the pilot from a microphone of the headset of the pilot, the voice information identifies key words through a serial channel of a voice model and a language model and combines the key words into a text, similarly, the weather message information received in the uploading process of the CPDLC link identifies the key word information through the language model and combines the key word information into the text, the text contents are all displayed on an electronic flight bag to assist the pilot to make relevant decisions, and the text contents can generate voice in a voice synthesis mode and transmit the voice to the headset of the pilot, so that the flight voice assistant application software is convenient to use when the pilot is inconvenient to look down, particularly when vibration noise in a cabin is large. The pilot makes a flight operation decision according to the air traffic control command, the flight operation decision content is transmitted to the airborne deployable model through a microphone in a speaking mode, and pilot command text information generated through a serial channel of the voice model and the language model is sent to corresponding interfaces of the communication, the navigation and the checklist for related control, so that the air traffic control-pilot voice conversation process and the process of controlling the communication, the navigation and the checklist functions through the pilot voice are realized. The speech recognition model deployment phase is shown in fig. 5. Wherein, the thick line identification path is the intelligent voice recognition communication interaction process of the air traffic control-pilot on-ring, and the thin line identification path is the intelligent voice checklist interaction process of the pilot on-ring.
The input of the voice model (acoustic model) is a spectrogram/audio file, and the output is Chinese pinyin/phoneme; the input of the language model is Chinese pinyin/morpheme, and the output is character sentence.
Example 1:
1. air traffic control-pilot voice data regularized sampling of Chinese and English mixed aviation traffic
The collection and editing of the air traffic control voice data are completed through special audio processing software, original samples (such as a longer-time-period audio original format of a south aviation takeoff form alarm interruption takeoff and east aviation forced re-flight and a delayed re-flight of a capital airport suddenly encountering multiple flights is a two-channel stereo, 22.05kHz sampling rate and wav format) are processed and converted into trainable samples (a single channel, 16kHz sampling rate and wav format) which are consistent with a standard corpus voice text format, and the following fig. 6 shows the following steps.
In addition, it is also converted in this way into speech audio samples (mono, 16kHz sampling rate, wav format) suitable for standard corpus training.
In order to add a training mode for the empty pipe voice, the empty pipe voice is added with a voice text label, so that some jargon of the empty pipe voice can be conveniently specified through the label. The empty-pipe speech is characterized by the mixing of Chinese and English, and some digits are corrected by phonetization in order to avoid misunderstanding.
For a typical air-managed voice, the data is first observed:
voice file: a38_1.wav
Voice oscillogram: monophonic, with a sampling rate of 16kHz, with 32-bit floating point quantization bits, using Microsoft pulse code modulation Microsoft PCM format (. wav file). The waveform diagram obtained using the audio processing software is shown in fig. 7.
The marking file of the voice audio is A38_1.wav.trn file, in order to avoid the situation that the mixing of Arabic numerals and Chinese characters continues to increase the difficulty, the Arabic numerals are all expressed by Chinese characters, here, the invention sets up an Arabic numerals pronunciation expressed by aviation jargon actually used by pilot-air traffic control personnel and Chinese English pronunciation of English letters, the pronunciations are all in one-to-one correspondence through corresponding Chinese characters, as shown in table 1, so as to conveniently compile related voice recognition dictionary templates in software, wherein the aviation jargon description is listed as the voice characters which are changed through in order to avoid semantic confusion during the aviation jargon-pilot voice communication, and the voice is the correct character of voice recognition.
TABLE 1 Chinese, English and Arabic numerals phonetic notation comparison table (example) for aviation jargon expression
Serial number | Speech annotation | Description of |
1 | 0 | Hole (C) |
2 | 1 | |
3 | 2 | |
4 | 3 | |
5 | 4 | Fourthly |
6 | 5 | Five of them |
7 | 6 | Six |
8 | 7 | |
9 | 8 | Eight-part |
10 | 9 | Nine-piece |
11 | E | In accordance with |
12 | U | Superior food |
13 | L | Left side of |
14 | R | Right side |
Further, as shown in FIG. 8, where action 1 requires a correctly recognized Chinese participle corpus text, action 2 is a Pinyin with tone marks, and action 3 continues to decompose the Pinyin into a sequence of minimal phonetic units of consonant/vowel phonemes.
Reading with python3, corresponding spectrogram data can be obtained, and then identified by the capability of CNN to process pictures. Putting the voice file A38_1.wav into engineering, constructing a Hamming window, framing the data, windowing the framing, carrying out Fourier transformation, obtaining fbank characteristics of the audio, and further obtaining a spectrogram of the wav file, wherein as shown in FIG. 9, the abscissa of the spectrogram is the time of a sampling point, the ordinate of the spectrogram is the frequency and the energy of the point, the larger the energy is, the darker the color is, and the spectrogram is an image recognition data source trained by a voice recognition model.
2. Speech recognition model training
a) Overview of Speech models
Considering the input and output in the neural network training process, the data needs to be unified into a format suitable for the deep learning framework to process in batch size. However, the time axis length of each sample is not the same, so the longest time inside the batch is selected as the reference for completion, and thus the data in each batch will be the same, so that parallel training can be performed. The acoustic model employs a full sequence convolutional neural network, as shown in fig. 10.
The model is a neural network model combining CNN (convolutional neural network based on spectrogram images) and CTC (connection time sequence classification model), time and frequency are used as two dimensions of the image, and modeling of the whole sentence of voice is realized through combination of more convolutional layers and pooling layers. The optimal network configuration of image modeling is fully utilized, and a CNN + DNN + CTC combined acoustic modeling mode is formed.
Then reading the speech annotation file with the end of trn and the name of the speech file with the end of wav to form an audio file list wav _ lst and an annotation file list, and recognizing according to the line 2 of the annotation text (shown in FIG. 6), wherein the line 2 of the annotation text is the pinyin with the tone with higher granularity, and the element with the phoneme level is not used for recognizing like the GMM-HMM model. Thus forming 13412 pinyin character string list label _ data, and the sequence diagram obtained is shown in fig. 11.
b) Fusion test of air traffic control voice and standard voice library
The list establishes a unique mapping to id for each Chinese pinyin label, so that the Chinese pinyin label becomes a dictionary, pinyin units of the dictionary are not repeated, and in the process of entering the dictionary, the sequence is not performed according to the sequence of A2_0, A2_1, … and D39_0, but the Chinese pinyin label is entered after being randomized and disordered. The data batches are generated by defining the batch size and the batch number and disordering the sequence of the training data, and the signal time-frequency diagram characteristics fbank and the label data of each batch are respectively stored into two lists, namely wav _ lst and label _ lst. Here we define the size of each batch as 4 data, and perform batch data generation over the total 13412 data, forming an input suitable for the deep learning framework.
Identifying the test result of the Chinese pinyin through a voice spectrogram: the recognition accuracy of 10 voice audios is 0.82, 0.53, 1.0, 1.0, 0.36, 0.27, 0.83, 0.76, 0.94, 0.5, and the average accuracy is 70.1%.
Identifying the test result of the Chinese pinyin through a voice spectrogram: the recognition accuracy of 10 voice audios is 0.76, 1.0, 0.63, 0.97, 0.48, 0.63, 0.92, 0.77, 0.36 and 0.69, the average accuracy is 72.1%, and the result is slightly better than the result of the no-air-control voice training.
If all the standard corpus voices and all the air traffic control voices are selected to be trained together, the convergent images are shown in fig. 12, the recognition accuracy rates of 10 voice audios are respectively 0.79,1.0,0.54,1.0,0.71,0.75, 0.81, 0.87, 0.29 and 0.72, the average accuracy rate is 74.8%, and the performance is superior to the training result of intercepting 10000 voice data.
The trained weight information of the voice model is stored in a model.h5 in an HDF5 format, and the trained weight information can be read to carry out a rapid reasoning process by directly calling the model.h5 by using the voice model subsequently without retraining.
c) Language modeling
The language model adopts a language model based on a self-attention mechanism, characters are identified from pinyin, the model has proven to have language expression capability stronger than that of other frames, and a language database of the language model stores identifiers, Chinese pinyin sequences and character sequences in the form of tsv files, as shown in fig. 13, wherein A37_1 to A38_18 are Chinese pinyin labels of empty-pipe voices and corresponding Chinese characters which are input by people.
Since the Chinese tsv file does not contain any English content, Chinese characters in Chinese English are used for replacing English letters in the blank pipe speech, for example, EU is used for replacing Chinese characters in Chinese English. A deep learning framework is used for building an encoder of a self-attention mechanism, and a test result is obtained by entering model inference.
The self-Attention model is from the Google team, and in the paper "Attention is all your needed" published by Ashish Vaswani et al in 2017, as shown in FIG. 14, the part used in the present invention is the left encoding part, and the right decoding part is not involved.
The attention mechanism used by the language model can be described as a process of mapping from query (query), key-value pairs (key-value pairs) to some output, where query, key and value up to the last output are in the form of a numerical vector, and the output is represented as a weighted sum of a series of values, and the weight of each value is obtained by the compatibility relationship between query and key. The expression of the attention mechanism in the text can be understood as the light color shown in fig. 15, wherein the darker content of the 1-star rating is expressed by words with a tendency to be dislike, and the darker content of the 5-star rating is expressed by words with a tendency to be liked, and the relationship between the context semantics can be found with a shorter time consumption from the attention mechanism.
The language model based on the air traffic control-pilot examines the semantic relation between contexts by inputting pinyin and tones obtained from voice recognition and taking the pinyin with the tones as special characters. In order to extract semantic content more accurately, a multi-head mechanism is generally adopted to perform parallel weight calculation, calculation is performed only in different projection modes of each head, then calculation results of a plurality of heads are connected to perform unified calculation, and the multi-head mechanism can capture the language order major-predicate logic relationship of the context.
The file in the hdf5 format is a common cross-platform data storage file, the file in the hdf5 format is used as a configuration file for model parameters in both the voice model and the language model, in the training process of the voice model and the language model, a training time point checkpoint is set, for example, a time point is set every 1 minute, for each training time point, a save _ weights function of Tensflow is adopted to record the parameter value of each neuron in each layer of the model at the time point, and the parameter value is structurally stored in the hdf5 file. The speech model training parameters are stored under the "logs _ am/" folder (and similarly, the language model training parameters are stored under the "logs _ lm/" folder), and after model training is complete (e.g., for 1 hour), both folders will have 60 hdf5 files. When model inference is actually used, the hdf5 file of the last time point is used, and the trained voice model and language model can be directly used for inference by loading the weight files through the load _ weights function in Tensflow without repeated training. The hdf5 file is used as a configuration file, belongs to a text form, and can realize communication and interaction of model weights through air-ground data chain transmission, so that an airplane airborne model can receive the latest training configuration parameters on the ground at any time and update the latest training configuration parameters to a newly trained model for reasoning, and further the deployment of a lightweight deep learning model is realized. All deep learning training processes are put into the ground server to operate, the airplane can automatically update the airborne deep learning model only by acquiring the hdf5 configuration file sent by the ground server in a data chain communication mode, and the reasoning generalization capability is improved. Therefore, various display cards or other additional computing training equipment are prevented from being additionally arranged in the airborne equipment, and the airworthiness problem of computer deployment of the large-scale computing equipment is avoided.
3. Speech recognition model reasoning
The inference process of the voice recognition model (namely the voice model + the channel design scheme of the language model) is to deploy the HDF5 format weight file which has completed the model training in the step 2 on the airborne equipment such as the ground air traffic control server and the flight management computer/electronic flight bag. According to the strict requirements of DO160 on the reliability and safety of the on-board computer, the more characteristics of the on-board computer cause more error possibility and more difficult verification, so that the on-board computer can only use a CPU with lower computing performance but higher reliability for computing at present, and in order to achieve the reasoning effect of real-time performance, the weight storage of the HDF5 format adopted by the invention is generally not more than 50M and can be suitable for the limited performance of the on-board computer.
The self-attention mechanism adopted by the invention mainly focuses on the relation between the pinyin in a single audio, because the semantic backgrounds of different audios are greatly different, for example, the text of the a2 series in the standard corpus focuses on describing the content in the literature, while the text of the a38 series added by us is mainly the voice content in the air traffic management, so that the text of the a2 series and the text of the a38 series cannot be understood in the context and semantics according to the ordinary attention mechanism, but only the text in the same audio can understand the context and semantic information. After the parameters of the deep learning framework are set, a multi-head mechanism is generally adopted to calculate the weights in parallel, only different projection modes of each head are used for calculation, then the calculation results of the heads are connected to carry out unified calculation, and the multi-head mechanism can capture the language sequence cardinal-predicate logic relationship of the context. Increasing the training samples from 10000 to 130000 can improve the effect of language model recognition, as shown in fig. 16.
It can be seen here that, except for the fact that unitary-1 recognizes an "invite," other aviation jargon can recognize the corresponding correct english and arabic numerals.
The speech model and the language model are mutually independent serial channel design architectures, wherein the speech modeling is from speech audio to Chinese pinyin with tones, and the language modeling is from Chinese pinyin with tones to character recognition. The design has the advantages that the design can concentrate on solving a small local problem, but the defect is that errors generated by the two local problems are overlapped and coupled, and finally the two modules are required to be established into a channel design, and the effectiveness of the model is comprehensively tested, namely the end-to-end design is directly from voice audio to recognition text.
In order to improve the generalization ability of the training model, the data set is divided into a training set, a verification set and a test set, where in order to test the recognition ability, the speech data in the training set is put into the test set for testing, and the final improvement effect of the speech recognition can be obtained as shown in fig. 17 below.
The invention carries out deep fusion among a deep learning training frame, a speech signal processing model based on an aviation jargon dictionary, a context semantic model based on self attention, an airborne terminal and ground terminal deployment device, provides double human-in-loop interaction modes between an air traffic control and an air ground between pilots and between the pilots and the airborne device, carries out model deployment and software application test on a ground server, and has the following technical advantages:
1. the unified standard is adopted to carry out data normalization on voice audio samples with different formats and different voice characteristics, so that the running efficiency of a deep learning model is greatly improved, and the intelligent reasoning module can be deployed under the hardware with limited performance of onboard computer equipment;
2. the serial channel design of the voice processing model and the language processing model can update the weight parameters of the lightweight model in real time according to the voice command of the air traffic control-pilot, so that the accuracy of voice recognition is further improved, and the practicability of the whole voice recognition device is also improved;
3. the work frame of the pilot-in-loop voice control checklist and the air traffic control-pilot-in-loop voice control communication is beneficial to the follow-up development of user feedback self-adaptive reinforcement learning based on checklist/flight voice knowledge map enhancement, and the follow-up access to knowledge bases such as flight voice knowledge maps, so that a more personalized checklist query voice interaction mode and an air traffic control pilot voice interaction mode are formed.
The method and the system for intelligently recognizing the airborne air traffic control instruction provided by the embodiment of the application are introduced in detail. The above description of the embodiments is only for the purpose of helping to understand the method of the present application and its core ideas; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" as used herein is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Claims (10)
1. An airborne air traffic control instruction intelligent voice recognition method is characterized by comprising the following steps: a hollow pipe end voice processing process and a pilot voice processing process;
the hollow pipe end voice processing process comprises the following steps:
s11, collecting the air traffic control voice data of the air traffic control personnel;
s12, processing the air traffic control voice data by using the trained voice model to obtain an air traffic control voice labeling text;
s13, processing the air traffic control voice labeling text by using the trained language model, extracting a plurality of air traffic control instruction keywords and combining the keywords into an air traffic control instruction keyword text;
s14, carrying out voice synthesis on the empty pipe instruction keyword text, transmitting the empty pipe instruction keyword text to a pilot for answering, and transmitting the empty pipe instruction keyword text to airborne terminal display equipment for displaying;
the pilot voice processing process comprises the following steps:
s21, acquiring pilot voice data of a pilot;
s22, processing the pilot voice data by using the trained voice model to obtain a pilot voice labeling text;
s23, processing the pilot voice labeling text by using the trained language model, identifying a plurality of pilot instruction keywords and combining the pilot instruction keywords into a pilot instruction keyword text;
and S24, triggering a communication device, a navigation device and/or an inspection sheet device to respond according to the content of the pilot instruction keyword text, and completing a communication function with the aerial terminal, an airplane body action adjusting function and/or a project inspection function.
2. The intelligent voice recognition method for airborne air traffic control commands according to claim 1, wherein the training contents of the voice models in steps S12 and S22 include: acquiring historical voice data, marking the historical voice data according to aviation professional terms and the tone of each word to obtain a voice marking text for model training, and inputting the voice marking text into an original model for training to obtain the trained voice model;
the original model is a neural network model based on the combination of a spectrogram image convolution neural network and a connection time sequence classification model, time and frequency are used as two dimensions of the image, and original modeling is achieved through the combination of more convolution layers and pooling layers.
3. The intelligent voice recognition method for airborne air traffic control commands according to claim 1, wherein the air traffic control voice labeling text in step S12 and the pilot voice labeling text in step S22 each comprise: the Chinese word segmentation linguistic data text after the aviation professional term is revised, the Chinese pinyin text with tone marks and the minimum phonetic unit sequence for decomposing the Chinese pinyin into consonant and vowel factors.
4. The intelligent speech recognition method for airborne empty pipe instructions according to claim 1, wherein the language model in steps S13 and S23 is a deep learning model based on a self-attention mechanism.
5. The intelligent voice recognition method for airborne air traffic control commands according to claim 1, wherein the air traffic control voice data in step S11 and the pilot voice data in step S21 are audio files or voice spectrograms.
6. The intelligent speech recognition method of airborne empty pipe instructions according to claim 1, further comprising: and expanding a flight voice corpus according to the air traffic control voice data and the air traffic control voice labeling text in the air traffic control voice processing process and the pilot voice data and the pilot voice labeling text in the pilot voice processing process.
7. The intelligent voice recognition method of airborne air traffic control instructions according to claim 6, wherein the specific content of the extension of the flying voice corpus comprises: traversing the flight voice corpus by taking the data to be judged as a template, judging whether the flight voice corpus has the same data or not, and if so, terminating the expansion; otherwise, the data to be judged is supplemented into the flight voice corpus, and the step S12 or the step S22 is carried out.
8. The intelligent voice recognition method of airborne empty pipe instructions according to claim 1,
updating the parameter configuration file of the corresponding voice model after processing the voice data in the steps S12 and S22;
in steps S13 and S23, the phonetic notation text is processed and the parameter configuration file of the corresponding language model is updated.
9. The intelligent voice recognition method for the airborne empty pipe instruction according to claim 1, wherein the parameter configuration files of the airborne end voice model and the language model and the parameter configuration files of the empty pipe end voice model and the language model are synchronized in real time, so that the airborne end model is automatically updated.
10. An airborne air traffic control instruction intelligent speech recognition system, characterized in that the system is capable of implementing the steps of the method according to any one of claims 1-9;
the system comprises:
an on-board terminal, comprising: the system comprises a voice acquisition processing module, a voice model module, a language model module, a check list system, a navigation response system and a communication response system; the voice acquisition processing module, the voice model module and the language model module are sequentially connected, and the language model is respectively connected with the examination order system, the navigation response system and the communication response system through interfaces;
a hollow pipe end, comprising: the system comprises a voice acquisition processing module, a voice model module and a language model module; the voice acquisition processing module, the voice model module and the language model module are connected in sequence;
the airborne end is in communication connection with the empty pipe end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111518350.3A CN114512131A (en) | 2021-12-13 | 2021-12-13 | Airborne air traffic control instruction intelligent voice recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111518350.3A CN114512131A (en) | 2021-12-13 | 2021-12-13 | Airborne air traffic control instruction intelligent voice recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114512131A true CN114512131A (en) | 2022-05-17 |
Family
ID=81547592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111518350.3A Pending CN114512131A (en) | 2021-12-13 | 2021-12-13 | Airborne air traffic control instruction intelligent voice recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114512131A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008750A (en) * | 2013-02-26 | 2014-08-27 | 霍尼韦尔国际公司 | System and method for correcting accent induced speech transmission problems |
CN108010523A (en) * | 2016-11-02 | 2018-05-08 | 松下电器(美国)知识产权公司 | Information processing method and recording medium |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN110175677A (en) * | 2019-04-16 | 2019-08-27 | 平安普惠企业管理有限公司 | Automatic update method, device, computer equipment and storage medium |
CN111063336A (en) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | End-to-end voice recognition system based on deep learning |
CN111105647A (en) * | 2019-12-16 | 2020-05-05 | 四川大学 | Control instruction issuing system and method based on air traffic control voice recognition and synthesis |
CN111459661A (en) * | 2020-03-16 | 2020-07-28 | 湖北亿咖通科技有限公司 | Neural network model parameter updating method |
-
2021
- 2021-12-13 CN CN202111518350.3A patent/CN114512131A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008750A (en) * | 2013-02-26 | 2014-08-27 | 霍尼韦尔国际公司 | System and method for correcting accent induced speech transmission problems |
CN108010523A (en) * | 2016-11-02 | 2018-05-08 | 松下电器(美国)知识产权公司 | Information processing method and recording medium |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN110175677A (en) * | 2019-04-16 | 2019-08-27 | 平安普惠企业管理有限公司 | Automatic update method, device, computer equipment and storage medium |
CN111105647A (en) * | 2019-12-16 | 2020-05-05 | 四川大学 | Control instruction issuing system and method based on air traffic control voice recognition and synthesis |
CN111063336A (en) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | End-to-end voice recognition system based on deep learning |
CN111459661A (en) * | 2020-03-16 | 2020-07-28 | 湖北亿咖通科技有限公司 | Neural network model parameter updating method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113439301B (en) | Method and system for machine learning | |
EP2040250B1 (en) | Spoken Vehicle control | |
US11488577B2 (en) | Training method and apparatus for a speech synthesis model, and storage medium | |
Zuluaga-Gomez et al. | Automatic speech recognition benchmark for air-traffic communications | |
CN111883110A (en) | Acoustic model training method, system, device and medium for speech recognition | |
JP7266683B2 (en) | Information verification method, apparatus, device, computer storage medium, and computer program based on voice interaction | |
CN112133290A (en) | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field | |
CN113160798B (en) | Chinese civil aviation air traffic control voice recognition method and system | |
Šmídl et al. | Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development | |
CN116092342B (en) | Automatic response and quality assessment method and system for controller simulation training | |
CN115910066A (en) | Intelligent dispatching command and operation system for regional power distribution network | |
Kopald et al. | Applying automatic speech recognition technology to air traffic management | |
CN114944150A (en) | Dual-task-based Conformer land-air communication acoustic model construction method | |
Helmke et al. | Readback error detection by automatic speech recognition and understanding | |
CN108364655A (en) | Method of speech processing, medium, device and computing device | |
Shi et al. | An end-to-end conformer-based speech recognition model for mandarin radiotelephony communications in civil aviation | |
Zhang et al. | Research on spectrum sensing system based on composite neural network | |
CN113362801A (en) | Audio synthesis method, system, device and storage medium based on Mel spectrum alignment | |
US11955016B2 (en) | Interface system for flight deck communications | |
CN114512131A (en) | Airborne air traffic control instruction intelligent voice recognition method and system | |
CN114373445B (en) | Voice generation method and device, electronic equipment and storage medium | |
US20220246150A1 (en) | System and/or method for semantic parsing of air traffic control audio | |
CN116229947A (en) | Voice recognition method and voice recognition device | |
Pardo et al. | Automatic understanding of ATC speech: Study of prospectives and field experiments for several controller positions | |
Achour et al. | Review of Automatic Speech Recognition Methodologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |