CN116030830A

CN116030830A - Prompt broadcasting system for aircraft crews and method thereof

Info

Publication number: CN116030830A
Application number: CN202310308003.0A
Authority: CN
Inventors: 贾超
Original assignee: Binzhou University
Current assignee: Binzhou University
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-04-28

Abstract

The invention relates to the broadcasting field, and particularly discloses a prompt broadcasting system for an aircraft attendant and a method thereof, which are characterized in that vibration frequency domain related characteristics and audio waveform semantic characteristics in the broadcasting process are fused by adopting a neural network model based on deep learning, and the audio data to be played are further adaptively corrected based on the vibration condition of an aircraft so as to pre-compensate an audio signal before the audio signal is transmitted, so that the transmission offset of the audio signal caused by the fluctuation of the aircraft is avoided to ensure that the audio signal can be transmitted to the ears of passengers in a fidelity manner.

Description

Prompt broadcasting system for aircraft crews and method thereof

Technical Field

The invention relates to the broadcasting field, in particular to a prompt broadcasting system for an aircraft crew member and a method thereof.

Background

The flight attendant cue broadcasting system is an important auxiliary support system in the navigation process of the aircraft, and the flight attendant can transmit information to passengers through the flight attendant cue broadcasting system. However, during the broadcasting of the aircraft crews, the propagation of the sound signal is shifted due to the aircraft fluctuation, so that some passengers may not hear the emphasis, and even the trouble, ambiguity and dislocation may occur.

Accordingly, an optimized aircraft attendant cue-play system is desired.

Disclosure of Invention

The present invention has been made to solve the above-mentioned technical problems. The embodiment of the invention provides a prompt broadcasting system for an aircraft attendant and a method thereof, which are characterized in that vibration frequency domain related characteristics and audio waveform semantic characteristics in the broadcasting process are fused by adopting a neural network model based on deep learning, the audio data to be broadcast are further adaptively corrected based on the vibration condition of an aircraft so as to pre-compensate an audio signal before the audio signal is transmitted, and in such a way, the transmission offset of the audio signal caused by the fluctuation of the aircraft is avoided so as to ensure that the audio signal can be transmitted to the ears of passengers in a fidelity manner.

According to one aspect of the present invention, there is provided an aircraft attendant cue-casting system comprising: the system comprises a data acquisition module to be played, a data processing module and a data processing module, wherein the data acquisition module to be played is used for acquiring audio data to be played in a preset time period provided by an aircraft attendant and an aircraft vibration signal in the preset time period; the vibration characteristic extraction module is used for carrying out frequency domain characteristic analysis on the aircraft vibration signals so as to obtain a plurality of aircraft vibration frequency domain statistical characteristics; the multi-mode coding module is used for inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics into a multi-mode joint encoder comprising an image encoder and a sequence encoder so as to obtain an aircraft vibration characteristic matrix; the audio waveform feature extraction module is used for enabling the audio data to be played in the preset time period to pass through a convolutional neural network model serving as a feature extractor so as to obtain an audio waveform image feature matrix; the feature fusion module is used for fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the countermeasure generation module is used for enabling the fusion feature matrix to pass through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and the broadcasting module is used for broadcasting the corrected audio data to be played.

In the foregoing prompt broadcast system for an aircraft attendant, the image encoder is a convolutional neural network model serving as a filter, and the sequence encoder is a multi-scale neighborhood feature extraction module, where the multi-scale neighborhood feature extraction module includes: the device comprises a first convolution layer, a second convolution layer parallel to the first convolution layer and a multi-scale feature fusion layer connected with the first convolution layer and the second convolution layer, wherein the first convolution layer uses a one-dimensional convolution kernel with a first length, and the second convolution layer uses a one-dimensional convolution kernel with a second length.

In the above-mentioned prompt broadcast system for aircraft crews, the multi-mode coding module includes: the frequency domain feature extraction unit is used for inputting the plurality of aircraft vibration frequency domain statistical features into the sequence encoder to obtain aircraft vibration frequency domain statistical feature vectors; the vibration waveform characteristic extraction unit is used for inputting the aircraft vibration signal into the image encoder to obtain an aircraft vibration waveform characteristic vector; and the joint optimization unit is used for calculating a vector product between the transpose vector of the aircraft vibration waveform characteristic vector and the aircraft vibration frequency domain statistical characteristic vector to obtain the audio waveform image characteristic matrix.

In the above-mentioned prompt broadcast system for aircraft crews, the audio waveform feature extraction module is configured to: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the audio waveform image feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the audio data to be played in the preset time period.

In the foregoing alert broadcast system for an aircraft attendant, the feature fusion module includes: the characteristic matrix unfolding unit is used for unfolding the aircraft vibration characteristic matrix and the audio waveform image characteristic matrix into aircraft vibration characteristic vectors and audio waveform image characteristic vectors; an affine mapping factor calculation unit for calculating a correlation-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; and the fusion unit is used for calculating the weighted sum according to the position between the aircraft vibration characteristic matrix and the audio waveform image characteristic matrix by taking the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights so as to obtain the fusion characteristic matrix.

In the above-described aircraft attendant cue broadcasting system, the affine mapping factor calculating unit is configured to: calculating a correlation-probability density distribution affine mapping factor of the aircraft vibration feature vector and the audio waveform image feature vector in the following optimization formula to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; wherein, the optimization formula is:

wherein->

Representing the aircraft vibration feature vector, +.>

Representing the audio waveform image feature vector, +.>

A correlation matrix obtained for position-by-position correlation between the aircraft vibration feature vector and the audio waveform image feature vector,>

and->

Is the mean vector and covariance matrix of the Gaussian density map formed by the aircraft vibration characteristic vector and the audio waveform image characteristic vector,/A>

Representing matrix multiplication, representing->

An exponential operation representing a matrix representing the calculation of a natural exponential function value raised to a power by a characteristic value at each position in the matrix,/v>

Affine mapping factors representing said first correlation-probability density distribution,>

representing the second associative-probability density distribution affine mapping factor.

In the above-described aircraft attendant cue-broadcasting system, the countermeasure-generating network includes a generator and a discriminator.

According to another aspect of the present invention, there is provided an alert broadcasting method for an aircraft attendant, including: acquiring audio data to be played for a predetermined period of time provided by an aircraft attendant, and an aircraft vibration signal for the predetermined period of time; performing frequency domain feature analysis on the aircraft vibration signals to obtain a plurality of aircraft vibration frequency domain statistical features; inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical features into a multi-mode joint encoder comprising an image encoder and a sequence encoder to obtain an aircraft vibration feature matrix; the audio data to be played in the preset time period passes through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix; fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the fusion feature matrix passes through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and playing the corrected audio data to be played.

According to still another aspect of the present invention, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the aircraft attendant prompt broadcast method as described above.

According to a further aspect of the present invention there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform an aircraft attendant cue broadcasting method as described above.

Compared with the prior art, the prompt broadcasting system and the method for the aircraft crews, provided by the invention, have the advantages that the vibration frequency domain associated characteristic and the audio waveform semantic characteristic in the broadcasting process are fused by adopting the neural network model based on deep learning, the audio data to be played is further adaptively corrected based on the vibration condition of the aircraft so as to pre-compensate the audio signal before the audio signal is transmitted, and in such a way, the transmission offset of the audio signal caused by the fluctuation of the aircraft is avoided so as to ensure that the sound signal can be transmitted to the ears of passengers in a fidelity manner.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing embodiments of the present invention in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, and not constitute a limitation to the invention. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a schematic view of a scene of an aircraft attendant cue-play system according to an embodiment of the present invention.

Fig. 2 is a block diagram of an aircraft attendant cue-casting system according to an embodiment of the present invention.

Fig. 3 is a system architecture diagram of an aircraft attendant cue-play system according to an embodiment of the present invention.

Fig. 4 is a block diagram of a multi-modal encoding module in an aircraft attendant hint broadcast system according to an embodiment of the present invention.

Fig. 5 is a block diagram of a feature fusion module in an aircraft attendant hint broadcasting system according to an embodiment of the present invention.

Fig. 6 is a flowchart of an aircraft attendant cue-casting method according to an embodiment of the present invention.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein.

Summary of the application: as described in the foregoing background art, during the broadcasting process of the aircraft crew member, the propagation of the sound signal is shifted due to the aircraft fluctuation, so that some passengers may not hear the emphasis, and even the trouble, ambiguity and dislocation are generated. That is, the audio data provided by the aircraft attendant, when played, may be subject to propagation deviations due to aircraft vibrations such that the occupants may not hear emphasis and even experience ambiguity, and misalignment.

In order to solve the technical problem, in the technical scheme of the invention, before the audio data to be played is played, the audio data to be played is adaptively corrected based on the vibration condition of an airplane so as to pre-compensate the audio signal before the audio signal is transmitted, and in this way, the transmission offset of the audio signal caused by the fluctuation of the airplane is avoided so as to ensure that the sound signal can be transmitted to the ears of passengers in a fidelity manner.

Specifically, audio data to be played for a predetermined period of time provided by an aircraft attendant is first acquired, and an aircraft vibration signal for the predetermined period of time. In order to extract the vibration characteristics of the aircraft in the preset time period more accurately, in the technical scheme of the invention, firstly, frequency domain characteristic analysis is carried out on the aircraft vibration signals to obtain a plurality of aircraft vibration frequency domain statistical characteristics, and then, the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics are input into a multi-mode joint encoder comprising an image encoder and a sequence encoder to obtain an aircraft vibration characteristic matrix. Here, it should be noted that the statistical features of the vibration frequency domains of the plurality of aircraft are discrete data, and the vibration signals of the aircraft are two-dimensional waveform diagrams, which belong to data of different modes, so in the technical scheme of the invention, the multi-mode joint encoder comprising the image encoder and the sequence encoder is used for extracting single-mode features of the vibration signals of the aircraft and the statistical features of the vibration frequency domains of the plurality of aircraft, and then carrying out mode feature fusion to obtain the vibration feature matrix of the aircraft.

And then, the audio data to be played in the preset time period passes through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix. That is, in the technical solution of the present invention, the audio data to be played is regarded as a two-dimensional waveform chart, and a convolutional neural network model with excellent performance in the field of image feature extraction is used as a feature extractor to capture high-dimensional image hidden features in the audio data to be played, that is, the audio waveform image feature matrix. After the aircraft vibration feature matrix and the audio waveform image feature matrix are obtained, the aircraft vibration feature matrix and the audio waveform image feature matrix are fused in a high-dimensional feature space to obtain a fused feature matrix containing aircraft vibration features and audio waveform features.

And then, the fusion characteristic matrix passes through a generator based on a countermeasure generation network to obtain corrected audio data to be played. That is, the corrected audio data to be played corresponding to the fusion feature matrix is fitted based on the countermeasure generation idea. It will be appreciated by those of ordinary skill in the art that the countermeasure generation network includes a discriminator and a generator, wherein during training, the generator is configured to generate corrected audio data to be played, and the discriminator is configured to measure a difference between the corrected audio data to be played and the actual corrected audio data to be played to obtain a discriminator loss function value, and update neural network parameters of the generator with the discriminator loss function value as the loss function value and by a gradient-decreasing back propagation algorithm such that the corrected audio data to be played generated by the generator is approximated to the actual corrected audio data to be played, in such a manner that the quality of the corrected audio data to be played is improved.

And playing the corrected audio data to be played after the corrected audio data to be played are obtained. That is, before playing the audio data to be played, the audio data to be played is adaptively corrected based on the vibration condition of the aircraft to perform audio signal precompensation before performing audio signal propagation, in this way, propagation offset of the audio signal caused by aircraft fluctuation is avoided to ensure that the sound signal can be transmitted to the ears of the passengers with fidelity.

Here, when the aircraft vibration feature matrix and the audio waveform image feature matrix are fused, for example, by a point-adding manner, to obtain a fused feature matrix, since the aircraft vibration feature matrix and the audio waveform image feature matrix express vibration frequency domain association features and audio waveform semantic features, respectively, if the correlation between the overall feature distribution of the aircraft vibration feature matrix and the audio waveform image feature matrix and the consistency of probability density distribution can be improved, the feature expression effect of the fused feature matrix can be improved by improving the fusion effect of the aircraft vibration feature matrix and the audio waveform image feature matrix, so that the quality of the corrected audio data to be played generated by the fused feature matrix can be improved.

The applicant of the present invention therefore first expands the aircraft vibration feature matrix and the audio waveform image feature matrix into aircraft vibration feature vectors, e.g., denoted as

And an audio waveform image feature vector, e.g., denoted +.>

Calculating the vibration characteristic vector of the airplane>

And the audio waveform image feature vector +.>

Affine mapping factor of the correlation-probability density distribution representing The method comprises the following steps: />

For the aircraft vibration feature vector +.>

And the audio waveform image feature vector +.>

An association matrix obtained by position-by-position association between the two, < >>

And->

Is the aircraft vibration feature vector +.>

And the audio waveform image feature vector +.>

The mean vector and covariance matrix of the constructed gaussian density map.

That is, by constructing the aircraft vibration feature vector

And the audio waveform image feature vector +.>

The associated feature space and the probability density space represented by Gaussian probability density can be obtained by fitting the aircraft vibration feature vector +.>

And the audio waveform image feature vector +.>

Mapping into affine homography subspaces within an associated feature space and a probability density space, respectively, to extract affine homography-compliant representations of feature representations within an associated feature domain and a probability density domain by distributing affine mapping factor values +.>

And->

The aircraft vibration feature matrix and the audio waveform image feature matrix are weighted respectively, so that the relevance of the feature representation of the aircraft vibration feature matrix and the audio waveform image feature matrix and the consistency of the probability density distribution can be improved, the feature expression effect of the fusion feature matrix obtained through fusion is improved, and the quality of audio data to be played after correction generated by the fusion feature matrix is improved.

Based on this, the invention proposes a prompt broadcasting system for aircraft crews, comprising: the system comprises a data acquisition module to be played, a data processing module and a data processing module, wherein the data acquisition module to be played is used for acquiring audio data to be played in a preset time period provided by an aircraft attendant and an aircraft vibration signal in the preset time period; the vibration characteristic extraction module is used for carrying out frequency domain characteristic analysis on the aircraft vibration signals so as to obtain a plurality of aircraft vibration frequency domain statistical characteristics; the multi-mode coding module is used for inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics into a multi-mode joint encoder comprising an image encoder and a sequence encoder so as to obtain an aircraft vibration characteristic matrix; the audio waveform feature extraction module is used for enabling the audio data to be played in the preset time period to pass through a convolutional neural network model serving as a feature extractor so as to obtain an audio waveform image feature matrix; the feature fusion module is used for fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the countermeasure generation module is used for enabling the fusion feature matrix to pass through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and the broadcasting module is used for broadcasting the corrected audio data to be played.

Fig. 1 is a schematic view of a scene of an aircraft attendant cue-play system according to an embodiment of the present invention. As shown in fig. 1, in this application scenario. The audio data to be played for a predetermined period of time provided by the aircraft attendant is acquired by an audio sensor (e.g., V1 as illustrated in fig. 1), and the aircraft vibration signal for the predetermined period of time is acquired by a vibration signal sensor (e.g., V2 as illustrated in fig. 1). The information is then input to a server (e.g., S in fig. 1) deployed with a prompt broadcast algorithm for the aircraft attendant, where the server is capable of processing the input information with the prompt broadcast algorithm for the aircraft attendant to generate corrected audio data to be played.

Having described the basic principles of the present invention, various non-limiting embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

Exemplary System: fig. 2 is a block diagram of an aircraft attendant cue-casting system according to an embodiment of the present invention. As shown in fig. 2, an aircraft attendant prompt broadcasting system 300 according to an embodiment of the present invention includes: a data acquisition module 310 to be played; a vibration feature extraction module 320; a multi-mode encoding module 330; an audio waveform feature extraction module 340; a feature fusion module 350; a countermeasure generation module 360; and a broadcast module 370.

The data to be played collection module 310 is configured to obtain audio data to be played in a predetermined period of time provided by an aircraft attendant, and an aircraft vibration signal in the predetermined period of time; the vibration feature extraction module 320 is configured to perform frequency domain feature analysis on the aircraft vibration signal to obtain a plurality of aircraft vibration frequency domain statistical features; the multi-mode encoding module 330 is configured to input the aircraft vibration signal and the plurality of aircraft vibration frequency domain statistics into a multi-mode joint encoder including an image encoder and a sequence encoder to obtain an aircraft vibration feature matrix; the audio waveform feature extraction module 340 is configured to pass the audio data to be played in the predetermined period of time through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix; the feature fusion module 350 is configured to fuse the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the countermeasure generation module 360 is configured to pass the fusion feature matrix through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and the broadcasting module 370 is configured to play the corrected audio data to be played.

Fig. 3 is a system architecture diagram of an aircraft attendant cue-play system according to an embodiment of the present invention. As shown in fig. 3, in the network architecture, first, audio data to be played for a predetermined period of time provided by an aircraft attendant and an aircraft vibration signal for the predetermined period of time are acquired through the data to be played acquisition module 310; then, the vibration feature extraction module 320 performs frequency domain feature analysis on the aircraft vibration signal acquired by the data acquisition module 310 to be played to obtain a plurality of aircraft vibration frequency domain statistical features; the multi-mode encoding module 330 inputs the aircraft vibration signals acquired by the data acquisition module 310 to be played and the plurality of aircraft vibration frequency domain statistical features acquired by the vibration feature extraction module 320 into a multi-mode joint encoder comprising an image encoder and a sequence encoder to acquire an aircraft vibration feature matrix; then, the audio waveform feature extraction module 340 passes the audio data to be played in the predetermined period acquired by the data to be played acquisition module 310 through a convolutional neural network model as a feature extractor to obtain an audio waveform image feature matrix; the feature fusion module 350 fuses the aircraft vibration feature matrix obtained by the multi-mode encoding module 330 and the audio waveform image feature matrix obtained by the audio waveform feature extraction module 340 to obtain a fused feature matrix; the countermeasure generation module 360 fuses the feature fusion module 350 to obtain corrected audio data to be played through a generator based on a countermeasure generation network; further, the playback module 370 plays back the corrected audio data to be played back.

Specifically, during operation of the alert broadcast system 300 for an aircraft attendant, the data to be played collection module 310 is configured to obtain audio data to be played for a predetermined period of time provided by the aircraft attendant, and an aircraft vibration signal for the predetermined period of time. It should be understood that in the actual playing process, the actually played audio data may deviate due to unstable vibration of the aircraft, so that the hearing of the passengers is adversely affected. More specifically, first, audio data to be played for a predetermined period of time provided by an aircraft attendant is acquired by an audio sensor, and an aircraft vibration signal for the predetermined period of time is acquired by a vibration signal sensor.

Specifically, during operation of the alert broadcast system 300 for an aircraft attendant, the vibration feature extraction module 320 is configured to perform frequency domain feature analysis on the aircraft vibration signal to obtain a plurality of aircraft vibration frequency domain statistical features. In the technical scheme of the invention, in order to extract the vibration characteristics of the aircraft in the preset time period more accurately, the frequency domain characteristic analysis is carried out on the aircraft vibration signals so as to obtain a plurality of aircraft vibration frequency domain statistical characteristics. In one specific example of the invention, the conversion of the time domain to the frequency domain of the aircraft vibration signal may be achieved by fourier transformation.

Specifically, during operation of the aircraft attendant prompt broadcast system 300, the multi-modal encoding module 330 is configured to input the aircraft vibration signal and the plurality of aircraft vibration frequency domain statistics into a multi-modal joint encoder including an image encoder and a sequence encoder to obtain an aircraft vibration feature matrix. That is, the aircraft vibration signal and the plurality of aircraft vibration frequency domain statistics are input to a multi-modal joint encoder comprising an image encoder and a sequence encoder to obtain an aircraft vibration feature matrix. Here, it should be noted that the plurality of aircraft vibration frequency domain statistics are discrete numbersAccording to the method, the aircraft vibration signals are two-dimensional waveform diagrams, and the two signals belong to data of different modes, so that in the technical scheme of the invention, the multimode joint encoder comprising the image encoder and the sequence encoder is used for extracting single-mode characteristics of the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics, and then carrying out mode characteristic fusion to obtain the aircraft vibration characteristic matrix. In a specific example of the present invention, the image encoder is a convolutional neural network model as a filter, and the sequence encoder is a multi-scale neighborhood feature extraction module, wherein the multi-scale neighborhood feature extraction module comprises: the device comprises a first convolution layer, a second convolution layer parallel to the first convolution layer and a multi-scale feature fusion layer connected with the first convolution layer and the second convolution layer, wherein the first convolution layer uses a one-dimensional convolution kernel with a first length, and the second convolution layer uses a one-dimensional convolution kernel with a second length. Specifically, in one example, first, the plurality of aircraft vibration frequency domain statistical features are input to the sequence encoder to obtain an aircraft vibration frequency domain statistical feature vector; more specifically, inputting the aircraft vibration frequency domain statistical feature into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale aircraft vibration frequency domain statistical feature vector, wherein the first convolution layer has a first one-dimensional convolution kernel with a first length; inputting the aircraft vibration frequency domain statistical features into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale aircraft vibration frequency domain statistical feature vector, wherein the second convolution layer is provided with a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and cascading the first neighborhood scale aircraft vibration frequency domain statistical feature vector with the second neighborhood scale aircraft vibration frequency domain statistical feature vector to obtain the aircraft vibration frequency domain statistical feature vector. Inputting the plurality of aircraft vibration frequency domain statistical features into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale aircraft vibration frequency domain statistical feature vector, including: using the multi-scale neighborhood feature The first convolution layer of the extraction module carries out one-dimensional convolution coding on the aircraft vibration frequency domain statistical characteristics according to the following one-dimensional convolution formula so as to obtain a first neighborhood scale aircraft vibration frequency domain statistical characteristic vector; wherein the one-dimensional convolution formula is:

wherein (1)>

For the first convolution kernel at->

Width in direction, ++>

For the first convolution kernel parameter vector, +.>

For a local vector matrix operating with a convolution kernel function, < ->

For the size of the first convolution kernel, +.>

Representing the statistical characteristics of the vibration frequency domain of the aircraft, < >>

Representing one-dimensional convolution encoding of the aircraft vibration frequency domain statistical characteristics; inputting the aircraft vibration frequency domain statistical feature into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale aircraft vibration frequency domain statistical feature vector, including: performing one-dimensional convolution coding on the aircraft vibration frequency domain statistical features by using a second convolution layer of the multi-scale neighborhood feature extraction module according to the following one-dimensional convolution formula to obtain second neighborhood scale aircraft vibration frequency domain statistical feature vectors; wherein the one-dimensional convolution formula is: wherein (1)>

For a second convolutionNuclear->

Width in direction, ++>

For a second convolution kernel parameter vector, +. >

For a local vector matrix operating with a convolution kernel function, < ->

For the size of the second convolution kernel,Xrepresenting the statistical characteristics of the vibration frequency domain of the aircraft, < >>

And representing one-dimensional convolution encoding of the aircraft vibration frequency domain statistical characteristics. Secondly, inputting the aircraft vibration signal into the image encoder to obtain an aircraft vibration waveform characteristic vector; more specifically, each layer of the convolutional neural network model using the filter performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature images based on a feature matrix to obtain pooled feature images; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the filter is the aircraft vibration waveform characteristic vector, and the input of the first layer of the convolutional neural network serving as the filter is the aircraft vibration signal.

Fig. 4 is a block diagram of a multi-modal encoding module in an aircraft attendant hint broadcast system according to an embodiment of the present invention. As shown in fig. 4, the multi-mode encoding module 330 includes: a frequency domain feature extraction unit 331, configured to input the plurality of aircraft vibration frequency domain statistical features into the sequence encoder to obtain an aircraft vibration frequency domain statistical feature vector; a vibration waveform feature extraction unit 332, configured to input the aircraft vibration signal to the image encoder to obtain an aircraft vibration waveform feature vector; and a joint optimization unit 333, configured to calculate a vector product between the transpose vector of the aircraft vibration waveform feature vector and the aircraft vibration frequency domain statistical feature vector to obtain the audio waveform image feature matrix.

Specifically, during the operation of the alert broadcasting system 300 for an aircraft attendant, the audio waveform feature extraction module 340 is configured to pass the audio data to be played in the predetermined period of time through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix. That is, the audio data to be played is regarded as a two-dimensional waveform diagram, and a convolutional neural network with excellent performance in the field of image feature extraction is further used for extracting high-dimensional image hidden features in the audio data to be played in the preset time period to obtain an audio waveform image feature matrix. In a specific example, the convolutional neural network as the feature extractor includes a plurality of neural network layers cascaded with each other, wherein each neural network layer includes a convolutional layer, a pooling layer, and an activation layer. In the encoding process of the convolutional neural network serving as the feature extractor, each layer of the convolutional neural network serving as the feature extractor performs convolution processing based on convolution kernel on input data by using the convolutional layer in the forward transfer process of the layer, performs pooling processing along the channel dimension on the convolutional feature map output by the convolutional layer by using the pooling layer, and performs activation processing on the pooled feature map output by the pooling layer by using the activation layer. More specifically, the output of the last layer of the convolutional neural network as the feature extractor is the audio waveform image feature matrix, and the input of the first layer of the convolutional neural network as the feature extractor is the audio data to be played for the predetermined period of time.

Specifically, during the operation of the alert broadcast system 300 for an aircraft attendant, the feature fusion module 350 is configured to fuse the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix. That is, after the aircraft vibration feature matrix and the audio waveform image feature matrix are obtained, the aircraft vibration is fused in a high-dimensional feature spaceThe characteristic matrix and the audio waveform image characteristic matrix are used for obtaining a fusion characteristic matrix containing aircraft vibration characteristics and audio waveform characteristics. Here, when the aircraft vibration feature matrix and the audio waveform image feature matrix are fused, for example, by a point-adding manner, to obtain a fused feature matrix, since the aircraft vibration feature matrix and the audio waveform image feature matrix express vibration frequency domain association features and audio waveform semantic features, respectively, if the correlation between the overall feature distribution of the aircraft vibration feature matrix and the audio waveform image feature matrix and the consistency of probability density distribution can be improved, the feature expression effect of the fused feature matrix can be improved by improving the fusion effect of the aircraft vibration feature matrix and the audio waveform image feature matrix, so that the quality of the corrected audio data to be played generated by the fused feature matrix can be improved. The applicant of the present invention therefore first expands the aircraft vibration feature matrix and the audio waveform image feature matrix into aircraft vibration feature vectors, e.g., denoted as

And an audio waveform image feature vector, e.g., denoted +.>

Calculating the vibration characteristic vector of the airplane>

And the audio waveform image feature vector +.>

Affine mapping factors of the association-probability density distribution expressed as: />

Wherein->

Representing the vibration characteristics of said aircraftVector (S)>

Representing the audio waveform image feature vector, +.>

and->

Representing matrix multiplication, representing->

representing the second associative-probability density distribution affine mapping factor. That is, by constructing the aircraft vibration feature vector +.>

And the audio waveform image feature vector +.>

The associated feature space and the probability density space represented by Gaussian probability density can be obtained by fitting the aircraft vibration feature vector +. >

And the audio waveform image feature vector +.>

And->

Fig. 5 is a block diagram of a feature fusion module in an aircraft attendant hint broadcasting system according to an embodiment of the present invention. As shown in fig. 5, the feature fusion module 350 includes: a feature matrix expansion unit 351 configured to expand the aircraft vibration feature matrix and the audio waveform image feature matrix into an aircraft vibration feature vector and an audio waveform image feature vector; an affine mapping factor calculating unit 352 for calculating a correlation-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; a fusion unit 353 for calculating a position weighted sum between the aircraft vibration feature matrix and the audio waveform image feature matrix with the first correlation-probability density distribution affine mapping factor and the second correlation-probability density distribution affine mapping factor as weights to obtain the fusion feature matrix.

Specifically, during operation of the aircraft attendant prompt broadcast system 300, the countermeasure generation module 360 is configured to pass the fusion feature matrix through a countermeasure generation network-based generator to obtain corrected audio data to be played. That is, the corrected audio data to be played corresponding to the fusion feature matrix is fitted based on the countermeasure generation idea. It will be appreciated by those of ordinary skill in the art that the countermeasure generation network includes a discriminator and a generator, wherein during training, the generator is configured to generate corrected audio data to be played, and the discriminator is configured to measure a difference between the corrected audio data to be played and the actual corrected audio data to be played to obtain a discriminator loss function value, and update neural network parameters of the generator with the discriminator loss function value as the loss function value and by a gradient-decreasing back propagation algorithm such that the corrected audio data to be played generated by the generator is approximated to the actual corrected audio data to be played, in such a manner that the quality of the corrected audio data to be played is improved.

Specifically, during operation of the aircraft attendant prompt broadcasting system 300, the broadcasting module 370 is configured to play the corrected audio data to be played. That is, in the technical scheme of the invention, before the audio data to be played is played, the audio data to be played is adaptively corrected based on the vibration condition of the aircraft so as to pre-compensate the audio signal before the audio signal is transmitted, and in this way, the transmission offset of the audio signal caused by the fluctuation of the aircraft is avoided so as to ensure that the sound signal can be transmitted to the ears of the passengers in a fidelity manner.

In summary, the flight attendant prompt broadcast system 300 according to the embodiment of the present invention is illustrated, which further adaptively corrects the audio data to be broadcast based on the vibration condition of the aircraft by fusing the vibration frequency domain related features and the audio waveform semantic features in the broadcast process by using the neural network model based on the deep learning, so as to pre-compensate the audio signal before the audio signal is propagated, in this way, propagation offset of the audio signal caused by the aircraft fluctuation is avoided to ensure that the audio signal can be transmitted to the ears of the passengers with fidelity.

As described above, the flight attendant cue broadcasting system according to the embodiment of the present invention can be implemented in various terminal devices. In one example, the aircraft attendant reminder broadcast system 300 according to embodiments of the present invention may be integrated into the terminal device as a software module and/or hardware module. For example, the aircraft attendant reminder broadcast system 300 may be a software module in the operating system of the terminal device or may be an application developed for the terminal device; of course, the aircraft attendant prompt broadcast system 300 could equally be one of the plurality of hardware modules of the terminal device.

Alternatively, in another example, the aircraft attendant reminder-to-broadcast system 300 and the terminal device may be separate devices, and the aircraft attendant reminder-to-broadcast system 300 may be connected to the terminal device via a wired and/or wireless network and communicate the interactive information in a agreed data format.

An exemplary method is: fig. 6 is a flowchart of an aircraft attendant cue-casting method according to an embodiment of the present invention. As shown in fig. 6, the method for broadcasting a prompt for an aircraft attendant according to an embodiment of the present invention includes the steps of: s110, acquiring audio data to be played in a preset time period provided by an aircraft attendant, and an aircraft vibration signal in the preset time period; s120, performing frequency domain feature analysis on the aircraft vibration signals to obtain a plurality of aircraft vibration frequency domain statistical features; s130, inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics into a multi-mode joint encoder comprising an image encoder and a sequence encoder to obtain an aircraft vibration characteristic matrix; s140, passing the audio data to be played in the preset time period through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix; s150, fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; s160, the fusion feature matrix passes through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and S170, playing the corrected audio data to be played.

In one example, in the above-mentioned alert broadcasting method for an aircraft attendant, the step S130 includes: inputting the plurality of aircraft vibration frequency domain statistical features into the sequence encoder to obtain an aircraft vibration frequency domain statistical feature vector; inputting the aircraft vibration signal into the image encoder to obtain an aircraft vibration waveform feature vector; and calculating a vector product between the transpose vector of the aircraft vibration waveform feature vector and the aircraft vibration frequency domain statistical feature vector to obtain the audio waveform image feature matrix. The image encoder is a convolutional neural network model serving as a filter, and the sequence encoder is a multi-scale neighborhood feature extraction module, wherein the multi-scale neighborhood feature extraction module comprises: the device comprises a first convolution layer, a second convolution layer parallel to the first convolution layer and a multi-scale feature fusion layer connected with the first convolution layer and the second convolution layer, wherein the first convolution layer uses a one-dimensional convolution kernel with a first length, and the second convolution layer uses a one-dimensional convolution kernel with a second length.

In one example, in the above-mentioned alert broadcasting method for an aircraft attendant, the step S140 includes: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the audio waveform image feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the audio data to be played in the preset time period.

In one example, in the above-mentioned alert broadcasting method for an aircraft attendant, the step S150 includes: expanding the aircraft vibration feature matrix and the audio waveform image feature matrix into aircraft vibration feature vectors and audio waveform image feature vectors; calculating a correlation-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; and calculating a weighted sum of the aircraft vibration feature matrix and the audio waveform image feature matrix according to positions by taking the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights so as to obtain the fusion feature matrix. Wherein calculating an association-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first association-probability density distribution affine mapping factor and a second association-probability density distribution affine mapping factor comprises: calculating a correlation-probability density distribution affine mapping factor of the aircraft vibration feature vector and the audio waveform image feature vector in the following optimization formula to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; wherein, the optimization formula is:

。

Wherein the method comprises the steps of

Representing the aircraft vibration feature vector, +.>

Representing the audio waveform image feature vector, +.>

For each of the aircraft vibration feature vector and the audio waveform image feature vectorThe correlation matrix obtained by the position correlation,

and->

Representing matrix multiplication, representing->

In one example, in the above-mentioned alert broadcasting method for an aircraft attendant, the step S160 includes: the countermeasure generation network includes a generator and a discriminator.

In summary, the method for prompting broadcasting by the aircraft crews according to the embodiment of the invention is explained, which further adaptively corrects the audio data to be broadcasted based on the vibration condition of the aircraft by adopting the neural network model based on deep learning to fuse the vibration frequency domain related characteristics and the audio waveform semantic characteristics in the broadcasting process so as to pre-compensate the audio signal before the audio signal is propagated, and in this way, propagation offset of the audio signal caused by the fluctuation of the aircraft is avoided so as to ensure that the audio signal can be transmitted to the ears of passengers with fidelity.

Exemplary electronic device: next, an electronic device according to an embodiment of the present invention is described with reference to fig. 7.

Fig. 7 illustrates a block diagram of an electronic device according to an embodiment of the invention.

As shown in fig. 7, the electronic device 10 includes one or more processors 11 and a memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 11 to implement the functions in the aircraft attendant reminder broadcast system and/or other desired functions of the various embodiments of the present invention described above. Various content, such as aircraft vibration frequency domain statistics, may also be stored in the computer readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

The input means 13 may comprise, for example, a keyboard, a mouse, etc.

The output device 14 may output various information to the outside, including audio data to be played after correction, and the like. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 10 that are relevant to the present invention are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium: in addition to the methods and apparatus described above, embodiments of the invention may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of the aircraft attendant prompt broadcast method according to various embodiments of the invention described in the "exemplary systems" section of this specification.

The computer program product may write program code for performing operations of embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present invention may also be a computer-readable storage medium, having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of the aircraft crew alerting broadcasting method according to various embodiments of the present invention described in the above-mentioned "exemplary systems" section of the present specification.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present invention have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present invention are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present invention. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the invention is not necessarily limited to practice with the above described specific details.

The block diagrams of the devices, apparatuses, devices, systems referred to in the present invention are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present invention, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present invention.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the invention to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. An aircraft attendant cue-casting system, comprising: the system comprises a data acquisition module to be played, a data processing module and a data processing module, wherein the data acquisition module to be played is used for acquiring audio data to be played in a preset time period provided by an aircraft attendant and an aircraft vibration signal in the preset time period; the vibration characteristic extraction module is used for carrying out frequency domain characteristic analysis on the aircraft vibration signals so as to obtain a plurality of aircraft vibration frequency domain statistical characteristics; the multi-mode coding module is used for inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical characteristics into a multi-mode joint encoder comprising an image encoder and a sequence encoder so as to obtain an aircraft vibration characteristic matrix; the audio waveform feature extraction module is used for enabling the audio data to be played in the preset time period to pass through a convolutional neural network model serving as a feature extractor so as to obtain an audio waveform image feature matrix; the feature fusion module is used for fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the countermeasure generation module is used for enabling the fusion feature matrix to pass through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and the broadcasting module is used for broadcasting the corrected audio data to be played.

2. The aircraft attendant cue broadcasting system of claim 1, wherein the image encoder is a convolutional neural network model as a filter, and the sequence encoder is a multi-scale neighborhood feature extraction module, wherein the multi-scale neighborhood feature extraction module comprises: the device comprises a first convolution layer, a second convolution layer parallel to the first convolution layer and a multi-scale feature fusion layer connected with the first convolution layer and the second convolution layer, wherein the first convolution layer uses a one-dimensional convolution kernel with a first length, and the second convolution layer uses a one-dimensional convolution kernel with a second length.

3. The aircraft attendant cue-casting system of claim 2, wherein the multi-modality encoding module comprises: the frequency domain feature extraction unit is used for inputting the plurality of aircraft vibration frequency domain statistical features into the sequence encoder to obtain aircraft vibration frequency domain statistical feature vectors; the vibration waveform characteristic extraction unit is used for inputting the aircraft vibration signal into the image encoder to obtain an aircraft vibration waveform characteristic vector; and the joint optimization unit is used for calculating a vector product between the transpose vector of the aircraft vibration waveform characteristic vector and the aircraft vibration frequency domain statistical characteristic vector to obtain the audio waveform image characteristic matrix.

4. The aircraft attendant cue-casting system of claim 3, wherein the audio waveform feature extraction module is configured to: each layer of the convolutional neural network model using the feature extractor performs, in forward transfer of the layer, input data: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature map along a channel dimension to obtain a pooled feature map; non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map; the output of the last layer of the convolutional neural network serving as the feature extractor is the audio waveform image feature matrix, and the input of the first layer of the convolutional neural network serving as the feature extractor is the audio data to be played in the preset time period.

5. The aircraft attendant cue-casting system of claim 4, wherein the feature fusion module comprises: the characteristic matrix unfolding unit is used for unfolding the aircraft vibration characteristic matrix and the audio waveform image characteristic matrix into aircraft vibration characteristic vectors and audio waveform image characteristic vectors; an affine mapping factor calculation unit for calculating a correlation-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; and the fusion unit is used for calculating the weighted sum according to the position between the aircraft vibration characteristic matrix and the audio waveform image characteristic matrix by taking the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights so as to obtain the fusion characteristic matrix.

6. The aircraft attendant cue broadcasting system of claim 5, wherein the affine mapping factor calculating unit is configured to: calculating a correlation-probability density distribution affine mapping factor of the aircraft vibration feature vector and the audio waveform image feature vector in the following optimization formula to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; wherein, the optimization formula is:

wherein->

Representing the aircraft vibration feature vector, +.>

Representing the audio waveform image feature vector, +.>

For the space between the aircraft vibration characteristic vector and the audio waveform image characteristic vectorPosition-by-position correlation obtained correlation matrix, +.>

And->

Representing matrix multiplication, representing->

Representing the first associative-probability density distribution affine mapping factor,

7. The aircraft attendant cue-casting system of claim 6, wherein the countermeasure generation network comprises a generator and a discriminator.

8. An aircraft attendant cue broadcasting method, comprising: acquiring audio data to be played for a predetermined period of time provided by an aircraft attendant, and an aircraft vibration signal for the predetermined period of time; performing frequency domain feature analysis on the aircraft vibration signals to obtain a plurality of aircraft vibration frequency domain statistical features; inputting the aircraft vibration signals and the plurality of aircraft vibration frequency domain statistical features into a multi-mode joint encoder comprising an image encoder and a sequence encoder to obtain an aircraft vibration feature matrix; the audio data to be played in the preset time period passes through a convolutional neural network model serving as a feature extractor to obtain an audio waveform image feature matrix; fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix; the fusion feature matrix passes through a generator based on a countermeasure generation network to obtain corrected audio data to be played; and playing the corrected audio data to be played.

9. The method of claim 8, wherein fusing the aircraft vibration feature matrix and the audio waveform image feature matrix to obtain a fused feature matrix, comprising: expanding the aircraft vibration feature matrix and the audio waveform image feature matrix into aircraft vibration feature vectors and audio waveform image feature vectors; calculating a correlation-probability density distribution affine mapping factor between the aircraft vibration feature vector and the audio waveform image feature vector to obtain a first correlation-probability density distribution affine mapping factor and a second correlation-probability density distribution affine mapping factor; and calculating a weighted sum of the aircraft vibration feature matrix and the audio waveform image feature matrix according to positions by taking the affine mapping factors of the first association-probability density distribution and the affine mapping factors of the second association-probability density distribution as weights so as to obtain the fusion feature matrix.

10. An aircraft attendant cue-casting method as claimed in claim 9, wherein the countermeasure generation network includes a generator and a discriminator.