CN113948091A - Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof - Google Patents

Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof Download PDF

Info

Publication number
CN113948091A
CN113948091A CN202111558210.9A CN202111558210A CN113948091A CN 113948091 A CN113948091 A CN 113948091A CN 202111558210 A CN202111558210 A CN 202111558210A CN 113948091 A CN113948091 A CN 113948091A
Authority
CN
China
Prior art keywords
subsystem
voice
audio
voiceprint
air
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111558210.9A
Other languages
Chinese (zh)
Inventor
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yixuan Yunhe Data Co ltd
Shandong Benin Electronic Technology Development Co ltd
Original Assignee
Beijing Yixuan Yunhe Data Co ltd
Shandong Benin Electronic Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yixuan Yunhe Data Co ltd, Shandong Benin Electronic Technology Development Co ltd filed Critical Beijing Yixuan Yunhe Data Co ltd
Priority to CN202111558210.9A priority Critical patent/CN113948091A/en
Publication of CN113948091A publication Critical patent/CN113948091A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces

Abstract

The invention belongs to the field of artificial intelligence voice recognition, and provides a civil aviation passenger plane air-ground conversation voice recognition engine and an application method thereof. The application method comprises the following steps: inputting a recording audio through a recording input device, and carrying out audio segmentation on the recording audio through a ground-air communication audio segmentation subsystem; inputting the segmented audio segment into an artificial intelligent voice recognition subsystem, so that the audio segment is transcribed into a text which is transcribed well, inputting an early warning and awakening voice subsystem for searching and awakening the voice, and retaining an index containing awakening word audio; the voiceprint confirmation subsystem searches a voiceprint library of the pilot through the index of the early warning awakening word and returns flight voice information of the same type of voiceprint; the identification of pilots and flight numbers containing specific keyword awakening voices is realized.

Description

Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof
Technical Field
The invention belongs to the field of artificial intelligence voice recognition, relates to accurate voice recognition, and particularly relates to a land-air conversation voice recognition engine for a civil aviation passenger plane and an application method thereof.
Background
From the beginning of the 21 st century, the civil aviation industry develops at a high speed, a large number of airplanes and flights are increased every year, and the requirements on the aviation safety and air traffic control guarantee are higher and higher. The land-air conversation is the basis for the air traffic controller to work on duty and is also the standard language used at any moment in daily work. The controller directly contacts with the pilot through the land-air communication to make clear instructions and guide the airplane to safely fly. However, for various reasons, the empty pipe personnel cannot perform high-intensity mental work for a long time, and human errors are inevitable. Statistically, human errors account for 80% of aviation accidents, and become an important cause for influencing aviation safety. In the existing passenger plane conflict event, a tower controller forgets the airplane dynamic state, so that serious accident signs (runway invasion) are caused. The land-air conversation is very important for both controllers and pilots, and whether the instructions are accurate or not directly influences the air traffic safety. Therefore, the voice recognition method is necessary, the monitoring performance of the air traffic control safety early warning is effectively improved through coordinated recording and other work, the continuous improvement of the reliability and the accuracy of the air traffic control safety early warning can be finally promoted on the root, the monitoring content is logically judged and processed, and accidents are avoided.
From the demands and cultivation of the units for the control talents, training and development of the control professional team by each control unit requires long-term post training and teaching feedback. Therefore, no matter schools or units hope to improve the ground-air conversation level of students, the students can be assisted by voice recognition, accurate air traffic control instructions can be standardized, unit training time is shortened, and the purpose of going on duty as soon as possible is achieved.
Nowadays, speech recognition has been widely applied in various fields, however, the research and application of speech recognition for civil aviation land-air communication is less, and the early training is mainly aimed at civil aviation control. In 2001, a voice recognition technology is applied to a DRS navigation management radar simulator, IBM VoiceType is adopted, and a sample space is reduced for improving the recognition performance, but the method is only suitable for recognition related to specific people and has poor recognition effect on non-specific people. In 2017, a better recognition rate is achieved by using a Kaldi voice development kit and using DNN-HMM as an acoustic model, but the recognition scene is limited.
The standard professional term for communication between a civil aviation air traffic controller (controller for short) and a pilot control information is a main channel for transmitting the instruction information, and the accuracy of the standard professional term is extremely important for guaranteeing the flight safety of an aircraft. The rules of land-air communication have the following characteristics:
(1) the word is special in pronunciation, e.g., 1 is read as a unitary and A is read as an ALPHA;
(2) the speech recognition system is a huge challenge to call because of the control speciality, the regional difference and the personnel complexity, and the control vocabularies, the unique regional names, the Chinese-English hybrid and the accent difference exist in the speech.
The existing voice recognition method has the following problems:
1. because the control speech is large in noise, high in speaking speed and numerous in accent, a common general speech recognition system can only recognize 10% to 20% of contents of the control speech, so that the control speech cannot be used at all.
2. The prior art does not have a solution for carrying out real-time early warning on airport land-air calls through voice recognition.
Disclosure of Invention
The invention aims to provide a civil aviation passenger plane air-ground communication voice recognition engine and an application method thereof, which can efficiently and accurately recognize voice commands and repeat and can continuously improve the accuracy aiming at the problems in the prior art.
The purpose of the invention can be realized by the following technical scheme: the civil aviation passenger plane land-air communication voice recognition engine comprises a recording input device and a computer which are connected through a signal line, wherein the computer is at least internally provided with a land-air communication audio frequency segmentation subsystem, an artificial intelligent voice recognition subsystem, an early warning awakening voice subsystem and a voiceprint confirmation subsystem, and the land-air communication audio frequency segmentation subsystem, the artificial intelligent voice recognition subsystem, the early warning awakening voice subsystem and the voiceprint confirmation subsystem are sequentially connected according to a logic sequence to form an integral neural network model; the voice is detected and divided into segment audios through a recording input device, the segmented audios are detected based on the human voice of a neural network, and the audios are converted into binary data and transmitted to an artificial intelligent voice recognition subsystem through an interface; the artificial intelligent voice recognition subsystem comprises a voice feature extraction module, a sequence learning module and a full connection module, wherein audio data are input to the voice feature extraction module under the artificial intelligent voice recognition subsystem, the module can convert the audio data into a spectrogram through a soundfile library and extract the mfcc audio features, and then the mfcc features are expanded, characterized and normalized; the artificial intelligent voice recognition subsystem comprises a voice feature extraction module, a sequence learning module and a full connection module; the sequence learning module consists of a multilayer convolutional neural network and a four-layer bidirectional gating circulation unit, the full-connection module is also provided with a connection time sequence classification module, the processed voice characteristics are input by the sequence learning module, parameters of the neural network layer in the sequence learning module are optimized through long-time training of a large number of voice data sets and GPU operation, and then tensors output by the sequence learning module are classified in the full-connection module to obtain a voice recognition text; the recognition text of the ground-to-air conversation audio frequency segmentation subsystem and the artificial intelligent voice recognition subsystem for completing voice recognition is transmitted into the early warning awakening voice subsystem for detecting the retrieval awakening words, and an index containing awakening word audio frequency is reserved; the voiceprint confirmation subsystem is provided with a voiceprint detection network, the voiceprint detection network adopts a multilayer logic network form, the multilayer logic network comprises a plurality of layers of long and short term memory network layers, and each layer of long and short term memory network layer is connected with a linear mapping layer; and the voiceprint confirmation subsystem transmits the early warning voice into the system through the index of the early warning awakening word, searches the pilot voiceprint library and returns the flight voice information of the same type of voiceprint.
In the above-mentioned speech recognition engine for a land-air communication of a civil aircraft, the voiceprint confirmation subsystem further has a softmax or contract comparison and judgment algorithm.
In the above-mentioned speech recognition engine for civil aviation airliner air-ground communication, the computer artificial intelligent speech recognition subsystem is opened to transmit data collected by the airport in real time through the interface.
In the above-mentioned speech recognition engine for air-ground communication of a civil aircraft, the artificial intelligent speech recognition subsystem and the early warning awakening speech subsystem have indexes for recognizing texts with awakening word tone frequencies, and the texts transcribed by the artificial intelligent speech recognition subsystem are input into the early warning awakening speech subsystem for detection of the retrieval awakening words.
The application method of the air-ground communication voice recognition engine of the civil aviation passenger plane comprises the following steps:
(1) inputting a recording audio through a recording input device, and then carrying out audio segmentation on the recording audio through a ground-air communication audio segmentation subsystem;
(2) inputting the segmented audio segment into an artificial intelligent speech recognition subsystem to enable the audio segment to be transcribed into a text;
(3) inputting the transcribed text into an early warning awakening voice subsystem to perform retrieval awakening pipe detection, and reserving an index containing awakening word tone frequency;
(4) the voiceprint confirmation subsystem searches a voiceprint library of the pilot through the index of the early warning awakening word and returns flight voice information of the same type of voiceprint; finally, pilot identification and flight number identification of the awakening voice containing the specific keyword are achieved, and real-time air conversation early warning of the airport is successfully achieved.
In the application method of the civil aviation passenger plane air-land call voice recognition engine, the step (1) comprises the following processing method of recorded audio:
a. noise reduction of the recorded audio: performing voice enhancement by using a neural network model capable of performing voice enhancement on the land-air conversation;
b. silence removal of recorded audio: the voice is segmented into small segments of voice through human voice detection based on a neural network.
In the application method of the civil aviation passenger plane air-land communication voice recognition engine, in the step (2), the voice frequency section is used as data input and is converted into a spectrogram form from a voice frequency form, the data firstly passes through the voice feature extraction module to extract the voice frequency features of different layers, and meanwhile, the data and the parameter quantity are greatly compressed, the training efficiency is improved, and the parameter overfitting is prevented; then the data enters a sequence learning module, and the memory and forgetting degree of state information at different moments is controlled by simulating the function of a human memory system to finish the learning of a language sequence; and finally, the data enters a full-connection module for classification learning and decision making, and an output sequence with the highest probability, namely a voice recognition result, is obtained by calculation of a connection time sequence classification module.
In the application method of the speech recognition engine for the air-ground communication of the civil aviation airliner, in the step (4), the voiceprint detection extracts the tone characteristic of a speaker through a multi-layer long-and-short time memory gate network; then calculating cosine similarity scores of the verification sample and all speaker centers; and finally, loss statistics based on the similarity score is carried out through softmax or contast, and the updating parameters are propagated reversely. Through similarity comparison of the 1:1 target voiceprints in a pilot voiceprint library, different pilot voiceprints are classified, the recognition rate is as high as 92%, and compared with the recognition accuracy rate of a traditional engine, the improvement range is about 30% -60%.
In the application method of the civil aviation passenger plane air-land communication voice recognition engine, in the step (4), speaker discrimination and classification are further carried out on voice audio in the voiceprint confirmation subsystem, and backtracking is facilitated through information classification.
In the application method of the civil aviation passenger plane air-ground communication voice recognition engine, in the step (4), logic processing and judgment analysis of audio quality are carried out through voice keywords or voice recognition, and recognition and error correction are carried out by combining the previous text content and the air management flow.
Compared with the prior art, the air-ground communication voice recognition engine for the civil aviation passenger plane and the application method thereof have the following beneficial effects:
the invention constructs a speech recognition engine based on artificial intelligence technology, which is used for recognizing ground-air conversation speech controlled by air. Compare in traditional speech recognition engine, speech recognition engine based on artificial intelligence not only makes the promotion that the discernment rate of accuracy is qualitative, and the model structure is retrencied by a wide margin moreover, and training and availability factor are high. Moreover, the technical scheme can obtain voiceprint information of the pilot to be rescued and flight number voice information in the voiceprint library through voiceprint early warning, so that air control personnel can acquire the pilot and flight information as early as possible, control program reaction is rapidly carried out, flight accidents are avoided, and life safety is improved.
Drawings
FIG. 1 is a flow chart of the airport voice warning overall system of the present invention.
Fig. 2 is a block diagram of a bi-directional gated loop unit of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings:
as shown in fig. 1 and fig. 2, the speech recognition engine for the land-air communication of the civil aircraft comprises a recording input device and a computer which are connected through a signal line, wherein the computer is at least internally provided with a land-air communication audio frequency segmentation subsystem, an artificial intelligent speech recognition subsystem, an early warning awakening speech subsystem and a voiceprint confirmation subsystem, and the land-air communication audio frequency segmentation subsystem, the artificial intelligent speech recognition subsystem, the early warning awakening speech subsystem and the voiceprint confirmation subsystem are sequentially connected according to a logic sequence to form an integral neural network model; the voice is detected and divided into segment audios through a recording input device, the segmented audios are detected based on the human voice of a neural network, and the audios are converted into binary data and transmitted to an artificial intelligent voice recognition subsystem through an interface; the artificial intelligent speech recognition subsystem comprises a speech feature extraction module, a sequence learning module and a full connection module, wherein under the artificial intelligent speech recognition subsystem, audio data are input to the speech feature extraction module, the module can convert the audio data into a speech spectrogram through a soundfile library and extract an mfcc audio feature, and then the mfcc feature is subjected to expansion, characterization and normalization processing; the artificial intelligent voice recognition subsystem comprises a voice feature extraction module, a sequence learning module and a full connection module; the sequence learning module consists of a multilayer convolutional neural network and a four-layer bidirectional gating circulation unit, the full-connection module is also provided with a connection time sequence classification module, the processed voice characteristics are input by the sequence learning module, the neural network layer parameters in the sequence learning module are optimized through long-time training of a large number of voice data sets and GPU (graphics processing unit) operation, and then tensors output by the sequence learning module are classified in the full-connection module to obtain a voice recognition text; the recognition text of the ground-to-air conversation audio frequency segmentation subsystem and the artificial intelligent voice recognition subsystem for completing voice recognition is transmitted into the early warning awakening voice subsystem for detecting the retrieval awakening words, and an index containing awakening word audio frequency is reserved; the voiceprint confirmation subsystem is provided with a voiceprint detection network, the voiceprint detection network adopts a multilayer logic network form, the multilayer logic network comprises a plurality of long and short term memory network layers, and each long and short term memory network layer is connected with a linear mapping layer; and the voiceprint confirmation subsystem transmits the early warning voice into the system through the index of the early warning awakening word, searches the pilot voiceprint library and returns the flight voice information of the same type of voiceprint.
The voiceprint confirmation subsystem is also provided with a softmax or contract comparison judgment algorithm.
And opening a computer artificial intelligent voice recognition subsystem and transmitting data acquired by the airport in real time through an interface.
The artificial intelligent voice recognition subsystem and the early warning awakening voice subsystem are provided with indexes for recognizing texts with awakening word voice frequencies, and the texts transcribed by the artificial intelligent voice recognition subsystem are input into the early warning awakening voice subsystem for retrieval awakening word detection.
The application method of the air-ground communication voice recognition engine of the civil aviation passenger plane comprises the following steps:
(1) inputting a recording audio through a recording input device, and then carrying out audio segmentation on the recording audio through a ground-air communication audio segmentation subsystem;
(2) inputting the segmented audio segment into an artificial intelligent speech recognition subsystem to enable the audio segment to be transcribed into a text;
(3) inputting the transcribed text into an early warning awakening voice subsystem to perform retrieval awakening pipe detection, and reserving an index containing awakening word tone frequency;
(4) the voiceprint confirmation subsystem searches a voiceprint library of the pilot through the index of the early warning awakening word and returns flight voice information of the same type of voiceprint; finally, pilot identification and flight number identification of the awakening voice containing the specific keyword are achieved, and real-time air conversation early warning of the airport is successfully achieved.
The step (1) comprises the following recording audio processing method:
a. noise reduction of the recorded audio: performing voice enhancement by using a neural network model capable of performing voice enhancement on the land-air conversation;
b. silence removal of recorded audio: the voice is segmented into small segments of voice through human voice detection based on a neural network.
In the step (2), the audio segment is used as data input and is converted into a spectrogram form from an audio form, the data is firstly subjected to a voice feature extraction module to extract audio features of different layers, and meanwhile, the data and parameter quantity are greatly compressed, so that the training efficiency is improved, and the overfitting of parameters is prevented; then the data enters a sequence learning module, and the memory and forgetting degree of state information at different moments is controlled by simulating the function of a human memory system to finish the learning of a language sequence; and finally, the data enters a full-connection module for classification learning and decision making, and an output sequence with the highest probability, namely a voice recognition result, is obtained by calculation of a connection time sequence classification module. The artificial intelligent speech recognition subsystem is the core of the whole speech recognition engine and is responsible for recognizing speech and converting the speech into a character sequence.
In the step (4), the voiceprint detection extracts the tone characteristic of the speaker through a multi-layer long-and-short time memory gate network; then calculating cosine similarity scores of the verification sample and all speaker centers; and finally, loss statistics based on the similarity score is carried out through softmax or contast, and the updating parameters are propagated reversely. Through similarity comparison of the 1:1 target voiceprints in a pilot voiceprint library, different pilot voiceprints are classified, the recognition rate is as high as 92%, and compared with the recognition accuracy rate of a traditional engine, the improvement range is about 30% -60%.
In the step (4), speaker discrimination and classification are carried out on the voice audio in the voiceprint confirmation subsystem, and backtracking is facilitated through information classification.
In the step (4), the audio quality is logically processed and judged and analyzed through voice keywords or voice recognition, and recognition and error correction are carried out by combining the prior text content and the empty management process.
The key point of the invention is that a solution scheme for carrying out real-time voice recognition and early warning on airport ground-to-air communication is designed by combining three mature natural language algorithms of a voice recognition engine, voiceprint recognition and awakening word index.
Compared with the prior art, the air-ground communication voice recognition engine for the civil aviation passenger plane and the application method thereof have the following beneficial effects:
the invention constructs a speech recognition engine based on artificial intelligence technology, which is used for recognizing ground-air conversation speech controlled by air. Compare in traditional speech recognition engine, speech recognition engine based on artificial intelligence not only makes the promotion that the discernment rate of accuracy is qualitative, and the model structure is retrencied by a wide margin moreover, and training and availability factor are high. Moreover, the technical scheme can obtain voiceprint information of the pilot to be rescued and flight number voice information in the voiceprint library through voiceprint early warning, so that air control personnel can acquire the pilot and flight information as early as possible, control program reaction is rapidly carried out, flight accidents are avoided, and life safety is improved.
The civil aviation passenger plane air-ground call voice recognition engine comprises the following application scenes, but is not limited to the following application scenes.
1. Command word-lifting board
The control monitoring personnel can hear the command of the controller in real time, but due to the inherent problem of the method of hearing voice, only one path can be heard generally, and the simultaneous monitoring of a plurality of seats is difficult to be considered. Meanwhile, the monitoring is difficult to replay, and the monitoring consumes more time. This results in inefficiency and oversight of regulatory monitoring, making the regulatory content opaque and potentially hidden safety hazards.
The system can effectively improve the effect and range of control and monitoring. All voice can be instantly converted into characters, a simple and clear interface is provided, the controlled characters can be quickly browsed, and the readability is greatly improved. The voice call is as historical record, and can be quickly checked and checked. Monitoring personnel do not need to monitor audio, so that a plurality of control seats can be simultaneously monitored, and the transparency of control activities is effectively improved.
2. Scene playback and speech retrieval
The event investigation often needs to extract the control recording, through listening to the pronunciation record, converts into the character record, and 15 minutes pronunciation need time about 1 hour to convert into the character record, wastes time and energy. With the progress of the technology, artificial intelligence has been gradually improved, and the controlled voice can be recognized in real time and converted into a character record. The radio call text records can be queried according to date and time, radio channel, flight or keyword, and voice playback and result derivation can be synchronously realized. Under the scene of event investigation, the original recording is recorded into characters, and the method is converted into text-based retrieval, so that the efficiency is improved, and the recording work is reduced.
3. Runway anti-invasion
Runway incursions may be caused by a variety of reasons, including runway incursions, human and vehicle incursions, and the like. Among them, a controller command misappropriation or a pilot operation error is an important factor causing a runway intrusion to occur. Currently, a ground optical detection system is often used to detect intrusions in real time. But the system can only give a warning if an intrusion is approaching or has already occurred. The system can check immediately when the command is sent out, so as to prevent the runway intrusion before the unit operates.
4. Command security check
The error of the instruction repeat of the unit is one of the important reasons for causing the control accident. Although two controllers per seat perform repeat checks, there is still an inadvertent occurrence. The system can compare the control instruction with the repeating instruction through accurate voice intention recognition, and check whether the unit is correct for the repeating instruction. The speech recognition based review can be an effective supplement to the repeat review by the controller. The condition that error recitations are not found can be greatly reduced, and the safety is effectively improved.
5. Examination of wave shedding
Due to frequency switching and the like, there may be a case that the unit is switched to an error frequency after the wave-breaking instruction is sent, and the repeating cannot be checked. The system can judge whether the condition of wave dropping by mistake exists by detecting the repeating result of the wave dropping. And further, a controller can timely handle the problem of wave shedding abnormity.
6. Misleading instruction inspection
The existing similar flight number reminding mainly aims to remind a controller not to send wrong instructions to similar flights, but cannot solve the problem of unit mishearing, so that the instructions originally sent to other units are mistaken as own instructions and are repeated. This situation is very dangerous as the group mislistens to the controller's instructions and repeats because of the presence of similar flight numbers. The system can detect the mistaken flight number in time by detecting the repeating content. And pushes the misinterpreted flight number and similar flights to the controller. So that the controller can more quickly deal with the problem of the flight number being mistaken and being repeated by the flight set.
7. Fatigue warning function
The fatigue and mental state of people are difficult to monitor in real time, and when the control pressure is high and the people are hard to keep, the situation of sleeping occasionally occurs, so that serious potential safety hazards are brought. The system can compare the data of command number, number of words, command number per minute and number of words issued by the statistical controller to each airplane with the historical records. The conditions that the state of the control personnel is not good and the like can be found in time, the monitoring capability for dangerous behaviors such as sleeping posts is better, and the potential safety hazard of control is reduced.
8. Call quality analysis
The call quality analysis refers to the quantitative result of the call quality obtained by analyzing the data such as call items, call time, valid phrases and the like. The indexes contained in the above list include speech rate, number of call instruction entries, effective/ineffective term ratio, Chinese/English ratio control, correction, and confirmation times.
9. Instruction intent analysis
Different intentions exist in the control command, and the statistics and analysis of the different intentions can effectively discover the difference between the planning design and the actual control. The system can count the distribution of specific intentions such as altitude change, route offset, sector access, transponder identification and the like, so as to be used for analysis by managers.
10. Conflicting instruction analysis
And resolving conflict instructions, such as instructions of height, relative, cross and the like, and effectively reflecting whether the control activities are operated as expected or not. The occurrence of a large number of resolution conflict instructions may indicate an air traffic control problem, such as too many flights, a change in the plan, etc. The system will perform a statistical analysis of the relevant instructions to indicate whether the type and number of conflict resolving instructions are within reasonable expectations.
11. Non-compliant instruction analysis
The regulation instruction is very complicated in specification, the standard explanation has many places needing attention, and each region has different standard requirements. Therefore, even a professional controller can hardly make each rule. The system analyzes multiple irregular expression habits, including over-spoken instructions, terms with improper keyword sequences, incomplete recitations, and the like. The analysis results can effectively help the control personnel to improve the improper word habits and improve the control quality.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims (10)

1. The civil aviation passenger plane land-air communication voice recognition engine comprises a recording input device and a computer which are connected through a signal line, and is characterized in that a land-air communication audio frequency segmentation subsystem, an artificial intelligent voice recognition subsystem, an early warning awakening voice subsystem and a voiceprint confirmation subsystem are at least arranged in the computer, and the land-air communication audio frequency segmentation subsystem, the artificial intelligent voice recognition subsystem, the early warning awakening voice subsystem and the voiceprint confirmation subsystem are sequentially connected according to a logic sequence to form an integral neural network model; the artificial intelligent voice recognition subsystem comprises a voice feature extraction module, a sequence learning module and a full connection module; the system comprises a sequence learning module, a full-connection module, a voiceprint confirmation subsystem, a voiceprint detection subsystem and a multi-layer logic network, wherein the sequence learning module consists of a multilayer convolutional neural network and four layers of bidirectional gating circulation units, the full-connection module is also provided with a connection time sequence classification module, the voiceprint confirmation subsystem is provided with a voiceprint detection network, the voiceprint detection network adopts a multilayer logic network form, the multilayer logic network comprises a plurality of layers of long and short term memory network layers, and a linear mapping layer is connected behind each layer of long and short term memory network layer.
2. The civil aircraft land-air conversation speech recognition engine of claim 1, wherein the voiceprint validation subsystem further comprises a softmax or contract comparison judgment algorithm.
3. The civil aircraft air-ground communication speech recognition engine of claim 1, wherein the transmission data collected in real time by the airport is received through an interface of the artificial intelligence speech recognition subsystem of the computer.
4. The civil aircraft air-ground call speech recognition engine of claim 1, wherein the artificial intelligence speech recognition subsystem has an index to recognize text and the early warning wake-up speech subsystem has an index to wake-up word speech audio.
5. An application method of a civil aviation passenger plane air-ground communication voice recognition engine is characterized by comprising the following steps:
(1) inputting a recording audio through a recording input device, and then carrying out audio segmentation on the recording audio through a ground-air communication audio segmentation subsystem;
(2) inputting the segmented audio segment into an artificial intelligent speech recognition subsystem to enable the audio segment to be transcribed into a text;
(3) inputting the transcribed text into an early warning awakening voice subsystem to perform retrieval awakening pipe detection, and reserving an index containing awakening word tone frequency;
(4) the voiceprint confirmation subsystem searches a voiceprint library of the pilot through the index of the early warning awakening word and returns flight voice information of the same type of voiceprint; finally, pilot identification and flight number identification of the awakening voice containing the specific keyword are achieved, and real-time air conversation early warning of the airport is successfully achieved.
6. The method of claim 5, wherein step (1) comprises the following method of processing the recorded audio,
a. noise reduction of the recorded audio: performing voice enhancement by using a neural network model capable of performing voice enhancement on the land-air conversation;
b. silence removal of recorded audio: the voice is segmented into small segments of voice through human voice detection based on a neural network.
7. The application method of the civil aviation passenger plane air-land call speech recognition engine as claimed in claim 5, wherein in step (2), the audio segment is used as data input, and is converted from an audio form to a spectrogram form, the data first passes through the speech feature extraction module, and audio features of different layers are extracted, meanwhile, data and parameter quantity are greatly compressed, training efficiency is improved, and parameter overfitting is prevented; then the data enters a sequence learning module, and the memory and forgetting degree of state information at different moments is controlled by simulating the function of a human memory system to finish the learning of a language sequence; and finally, the data enters a full-connection module for classification learning and decision making, and an output sequence with the highest probability, namely a voice recognition result, is obtained by calculation of a connection time sequence classification module.
8. The method for applying the speech recognition engine for the land-air conversation of the civil aviation passenger plane as claimed in claim 5, wherein in the step (4), the voiceprint detection is used for extracting the timbre characteristics of the speaker through a multi-layer long-and-short time memory gate network; then calculating cosine similarity scores of the verification sample and all speaker centers; and finally, carrying out loss statistics based on similarity scores through softmax or contast, reversely transmitting updated parameters, carrying out similarity comparison in a pilot voiceprint library through a 1:1 target voiceprint, and classifying different pilot voiceprints.
9. The method as claimed in claim 5, wherein in step (4), speaker discrimination and classification is also performed on the voice audio in the voiceprint recognition subsystem.
10. The method for applying the speech recognition engine for air-ground communication of civil aviation passenger plane as claimed in claim 5, wherein in step (4), the audio quality is further analyzed by logic processing and judgment through speech keyword or speech recognition, and the recognition error correction is performed in combination with the previous text content and the air management process.
CN202111558210.9A 2021-12-20 2021-12-20 Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof Pending CN113948091A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111558210.9A CN113948091A (en) 2021-12-20 2021-12-20 Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111558210.9A CN113948091A (en) 2021-12-20 2021-12-20 Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof

Publications (1)

Publication Number Publication Date
CN113948091A true CN113948091A (en) 2022-01-18

Family

ID=79339262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111558210.9A Pending CN113948091A (en) 2021-12-20 2021-12-20 Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof

Country Status (1)

Country Link
CN (1) CN113948091A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115440191A (en) * 2022-11-09 2022-12-06 四川大学 Airplane cockpit safety auxiliary method based on deep learning and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111210829A (en) * 2020-02-19 2020-05-29 腾讯科技(深圳)有限公司 Speech recognition method, apparatus, system, device and computer readable storage medium
CN111341325A (en) * 2020-02-13 2020-06-26 平安科技(深圳)有限公司 Voiceprint recognition method and device, storage medium and electronic device
US20210043190A1 (en) * 2018-10-25 2021-02-11 Tencent Technology (Shenzhen) Company Limited Speech recognition method and apparatus, and method and apparatus for training speech recognition model
CN113066499A (en) * 2021-03-12 2021-07-02 四川大学 Method and device for identifying identity of land-air conversation speaker
CN113112877A (en) * 2021-03-16 2021-07-13 广州市中南民航空管通信网络科技有限公司 Runway intrusion early warning method, terminal and device
CN113393836A (en) * 2021-06-08 2021-09-14 成都傅立叶电子科技有限公司 Airborne station voice recognition control method and system
CN113409787A (en) * 2021-07-08 2021-09-17 上海民航华东空管工程技术有限公司 Civil aviation control voice recognition system based on artificial intelligence technology

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
US20210043190A1 (en) * 2018-10-25 2021-02-11 Tencent Technology (Shenzhen) Company Limited Speech recognition method and apparatus, and method and apparatus for training speech recognition model
CN111081260A (en) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 Method and system for identifying voiceprint of awakening word
CN111341325A (en) * 2020-02-13 2020-06-26 平安科技(深圳)有限公司 Voiceprint recognition method and device, storage medium and electronic device
CN111210829A (en) * 2020-02-19 2020-05-29 腾讯科技(深圳)有限公司 Speech recognition method, apparatus, system, device and computer readable storage medium
CN113066499A (en) * 2021-03-12 2021-07-02 四川大学 Method and device for identifying identity of land-air conversation speaker
CN113112877A (en) * 2021-03-16 2021-07-13 广州市中南民航空管通信网络科技有限公司 Runway intrusion early warning method, terminal and device
CN113393836A (en) * 2021-06-08 2021-09-14 成都傅立叶电子科技有限公司 Airborne station voice recognition control method and system
CN113409787A (en) * 2021-07-08 2021-09-17 上海民航华东空管工程技术有限公司 Civil aviation control voice recognition system based on artificial intelligence technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115440191A (en) * 2022-11-09 2022-12-06 四川大学 Airplane cockpit safety auxiliary method based on deep learning and electronic equipment
CN115440191B (en) * 2022-11-09 2023-01-24 四川大学 Airplane cockpit safety auxiliary method based on deep learning and electronic equipment

Similar Documents

Publication Publication Date Title
CN111667830B (en) Airport control decision support system and method based on controller instruction semantic recognition
Cordero et al. Automated speech recognition in ATC environment
Delpech et al. A real-life, French-accented corpus of air traffic control communications
Prado et al. Designing the Radiotelephony Plain English Corpus (RTPEC): A specialized spoken English language corpus towards a description of aeronautical communications in non-routine situations
CN110428830B (en) Regular expression-based empty pipe instruction intention identification method
CN112133290A (en) Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field
Hua et al. Extraction and analysis of risk factors from Chinese railway accident reports
CN113160798B (en) Chinese civil aviation air traffic control voice recognition method and system
CN113157916A (en) Civil aviation emergency extraction method based on deep learning
Cordero et al. Automated speech recognition in controller communications applied to workload measurement
CN115240651A (en) Land-air communication speaker role identification method and device based on feature fusion
Kopald et al. Applying automatic speech recognition technology to air traffic management
CN112397054A (en) Power dispatching voice recognition method
CN110232121B (en) Semantic network-based control instruction classification method
Kleinert et al. Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning
CN113948091A (en) Air-ground communication voice recognition engine for civil aviation passenger plane and application method thereof
CN117115581A (en) Intelligent misoperation early warning method and system based on multi-mode deep learning
CN111627257A (en) Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN113327607B (en) Cabin voice command handshake detection system and device
CN116092342A (en) Automatic response and quality assessment method and system for controller simulation training
CN115223558A (en) Method, system and computer storage medium for managing air traffic control voice
CN113821053A (en) Flight assisting method and system based on voice recognition and relation extraction technology
Raut et al. Automatic speech recognition and its applications
CN115440191B (en) Airplane cockpit safety auxiliary method based on deep learning and electronic equipment
Zuluaga-Gomez et al. Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination