CN110364165A

CN110364165A - Flight dynamic information voice inquiry method

Info

Publication number: CN110364165A
Application number: CN201910648358.8A
Authority: CN
Inventors: 刘晓疆; 战嘉馨; 陈晓; 李闯; 陈宇; 刘青; 张新华; 李坤; 陶欣
Original assignee: Qingdao Civil Aviation Kaiya System Integration Co Ltd
Current assignee: Qingdao Civil Aviation Kaiya System Integration Co Ltd
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2019-10-22

Abstract

Flight dynamic information voice inquiry method of the present invention, based on improvement deep neural network-hidden Markov model (IM-DNN-HMM), carry out speech recognition, recognition result is converted into text information to generate querying condition, to realize that complex environment is inquired from the highly efficient and accurate flight dynamic information under different accent input conditions.Including following below scheme step: step 1, user logs in client and typing voice messaging；Step 2, the voice of typing is sent to server end；Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtain initial recognition result A and replacement recognition result B；Step 4, participle analysis is carried out to obtain optimal result C to result A and B, extracts feature from being pushed in the result C again returned to after user to generate flight enquiry condition, and optimize user speech model；Step 5, it is inquired by flight enquiry condition, to obtain dynamic Flight Information.

Description

Flight dynamic information voice inquiry method

Technical field

The present invention relates to a kind of novel flight dynamic information voice inquiry methods, that is, are based on IM-DNN-HMM neural network mould The method that the speech recognition result of type carries out text information inquiry, belongs to the data processing field of civil aviation transportation.

Background technique

With the fast development of domestic Civil Aviation Industry, airport scale is increasing, flight quantity is more and more, therefore for mentioning The requirement for rising Airport Operation efficiency and service quality is also higher and higher.

Flight dynamic information can be conveniently and efficiently inquired, reasonably planning and the distribution existing money in airport are not only improved Source, while can also relatively improve in-mind anticipation and the service impression of consumer.

Currently, the information query method based on speech recognition is disclosed, such as following first published application, application No. is CN201410262964.3 applies for entitled Scheduled Flight intelligent inquiry system and method based on speech recognition, system packet Cell phone client and server end are included, the client includes client speech recognition analysis module, client user's interaction mould Block；The server end includes server end Scheduled Flight data service module.Text envelope is obtained by carrying out identification to voice Breath, to carry out flight inquiring based on text information.But the document does not absolutely prove how text is generated from speech recognition result How this information improves the accuracy and efficiency of speech recognition, and how to form the text index result of matching inquiry condition.

Existing relatively conventional voice recognition acoustic model, including gauss hybrid models-hidden Markov model (Gaussian Mixture Model-Hidden Markoy Model, abbreviation GMM-HMM), deep neural network (Deep Neural Network, abbreviation DNN) and the mixed model of above-mentioned model etc..Mixed model has become the master of speech recognition Acoustic model is led, parameter adjustment can be carried out in feature space and the model space.Such as from reduction noise point, mixed model Front end using DNN carry out Feature Mapping reduce noise, rear end use deep neural network-hidden Markov model (DNN- HMM it) is used as acoustic model, the DNN that front end features map is combined with the DNN adjusting parameter process of rear end acoustic model, with Obtain the promotion of speech recognition result precision.

But in real speech polling, even if carrying out speech recognition using above-mentioned mixed model, recognition result still can not Text information querying condition is directly generated, in most cases passenger and consumer are still to be manually entered based on text information inquiry It is main.For limbs impaired, existing so-called voice inquiry method still is apparent not enough.And due to airport environment more noise Miscellaneous and accent not parity problem, causes current speech recognition efficiency lower, often also slower than being manually entered inquiry.

In view of this, special propose present patent application.

Summary of the invention

Flight dynamic information voice inquiry method of the present invention is to solve the above-mentioned problems of the prior art and base In improving deep neural network-hidden Markov model (IM-DNN-HMM), speech recognition is carried out, recognition result is converted into text This information is to generate querying condition, to realize complex environment and the highly efficient and accurate flight under different accent input conditions Multidate information inquiry.

The flight dynamic information voice inquiry method, be based on IM-DNN-HMM model carry out speech recognition, including Following below scheme step:

Step 1, user logs in client and typing voice messaging；

Step 2, the voice of typing is sent to server end；

Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtains initial recognition result A and replace Change recognition result B；

Step 4, participle analysis is carried out to obtain optimal result C to result A and B, is again returned to after being pushed to user As a result feature is extracted in C to generate flight enquiry condition, and optimizes user speech model；

Step 5, it is inquired by flight enquiry condition, to obtain dynamic Flight Information.

As described above, the application flight dynamic information voice inquiry method, can learn the voice of customer, to defeated The voice entered is analyzed to obtain the text information comprising querying condition, to carry out flight inquiring based on querying condition, improves The efficiency and convenience of inquiry.In addition, carrying out depth for the pronunciation characteristic of user by the continuous study to user speech The service of customization is realized in optimization.

Further, in above-mentioned steps 3, existing user speech model, then replaced if it does not exist with universal phonetic model Generation.

In above-mentioned step 4, as a result C is pushed to user, is modified to result C by user or is confirmed and return again It returns；The result C of return is labeled as query characteristics, is constructed new user speech model or is updated existing user speech mould Type.

The IM-DNN-HMM model is the improvement for DNN-HMM model, by DBM model and DBN model mixing At；Unlike DNN-HMM model, the input layer of IM-DNN-HMM model is the DBM that non-directed graph connects entirely between h1, h2 Model is the DBN model that digraph connects entirely between h2, h3, h4.

To sum up content, the flight dynamic information voice inquiry method have the advantage that

1, providing a whole set of includes data receiver, processing, model training, modelling effect, the complete flight of model encapsulation The prediction scheme of time is landed, application prospect is wider；

2, data class demand is few, and forecast cost is low.History ADS-B data and the history landing time of flight are only needed, Prediction only needs real-time ADS-B data, this for the Civil Aviation Industry extremely sensitive to data for being of great significance；

3, forecasting accuracy is high, big to the help of airfield support work；The precision of prediction of half an hour can achieve mistake in advance Within poor 4.5 minutes, and as flight is close to landing, precision of prediction is stepped up.The lead of half an hour is for airport Ground, which takes safeguard work also, enough meanings.

Detailed description of the invention

Fig. 1 is the flow diagram of herein described flight dynamic information voice inquiry method；

Fig. 2 is the flow diagram that flight enquiry condition generates；

Fig. 3 is the architecture diagram using the information query system of the application；

Fig. 4 is the Structure Comparison figure of DNN-HMM Yu two kinds of models of IM-DNN-HMM；

Fig. 5 is the frame diagram that speech recognition is carried out using IM-DNN-HMM model.

Specific embodiment

The present invention is further described with reference to the accompanying drawings and examples.

Embodiment 1 is looked into using the information of herein described flight dynamic information voice inquiry method as shown in Figure 1 to Figure 3 Inquiry system is made of client and server end.

The client includes voice input module, voice sending module, user interactive module.

The server end include speech reception module, speech recognition module, semantics recognition module, enquiry module and Data bins module.

In client,

The voice input module, for calling user mobile phone microphone with typing user speech information；The language Sound sending module, for the user speech information of voice input module typing to be sent to the speech reception module of server end；

The user interactive module, the flight enquiry condition project for showing speech recognition result, lacking, and obtain User inputs corresponding flight enquiry condition and reception and shows flight inquiring result.

In server end,

The speech reception module, for receiving the data of client voice sending module transmission；

The speech recognition module, user speech, converts text (being named as T0) for voice for identification；

Speech recognition module carries out deep learning to the voice that user inputs based on IM-DNN-HMM model, is continuously improved Audio identification efficiency；Speech recognition module searches for the speech model information of user by data bins, can also pass through the mould of data bins Paste sound database is modified speech recognition result；

The semantics recognition module based on pkuseg participle packet, divides the text T0 of speech recognition module output Analysis, and then correct and form new text, the new text after correction is named as T1；

Speech recognition module sends T1 to the user interactive module of client；If inerrancy, user is returned； If wrong, user can be modified and be returned to identification recognition result by user interactive module；Text after return It is named as T2；After user returns, T2 is sent to semantics recognition module；

Semantics recognition module extracts flight enquiry condition according to T2；

The enquiry module, the flight enquiry condition for being extracted according to speech recognition module call data bins module, Obtain dynamic Flight Information；When the flight enquiry condition includes flight number and flight date, or including city of setting out, reach When city and flight date, then the flight enquiry condition is complete；

When flight enquiry condition is imperfect, the flight enquiry condition lacked is obtained by enquiry module, comprising:

When lacking flight date, then using current date as flight date；

When lack set out city when, then calling mobile phone GPS acquisition is currently located city as setting out city；

When lacking flight number or reaching city, or be hard to tell city of setting out with city is reached, then client is called to use Family interactive module obtains the corresponding information of user's input；

The data bins module includes Scheduled Flight database, Airport information database, registration User Information Database With voice corpus data library.

Wherein, Scheduled Flight database includes real-time dynamic flight data both at home and abroad；The Airport information database includes The relevant information on domestic and international airport；

Voice corpus data library includes common fuzzy phoneme library, universal phonetic model；Wherein, universal phonetic model is to utilize The IM-DNN-HMM model of standard mandarin training；

Register User Information Database, registration information, user speech model including user；Wherein, user speech model It is on the basis of universal model, according to the IM-DNN-HMM model of user speech training.

The flight dynamic information voice inquiry method is based on improved deep neural network-hidden Markov mould Type (abbreviation IM-DNN-HMM model) carries out speech recognition.

As shown in Figure 4.(a) it is partially DNN-HMM model, is partially (b) IM-DNN-HMM model.

The IM-DNN-HMM model is the improvement for DNN-HMM model.

The deep neural network that DNN-HMM model is made of DBN model, DBN model hidden layer are had using RBM composition To graph model.And IM-DNN-HMM model is mixed by DBM model and DBN model.

Specifically, the sampled value of the undirected graph model that DBM model is made of two layers of RBM, every node layer is connected by two layers The node connect calculates jointly.DBM model training time span is related with its number of plies and every layer of number of nodes.

The Directed Graph Model that DBN model is made of four layers of RBM, during pre-training, upper layer is output, and lower layer is defeated Enter.After all layers of training, supervision fine tuning has been carried out downwards by top layer.

In Fig. 4, DNN-HMM model and IM-DNN-HMM model have 1 input layer, 4 hidden layers and 1 output Layer.H1, h2, h3, h4 respectively correspond 4 hidden layers, and W1, W2, W3, W4, W5 respectively correspond the connection weight of interlayer.Model is identical Node layer is not connected to, and is all connected between different node layers.

The difference is that the input layer of DNN-HMM model, being the DBN mould that digraph connects entirely between h1, h2, h3, h4 Type.And the input layer of IM-DNN-HMM model, be the DBM model that non-directed graph connects entirely between h1, h2, it is to have between h2, h3, h4 The DBN model connected entirely to figure.

IM-DNN-HMM model is using the vector of regular length as mode input, and first by h1, h2 training, h2 is as DBM The output layer of model, while being also the input of h3, h4, output is the character representation of current input information.

Compared with DNN-HMM model, IM-DNN-HMM model has used DBM model to carry out place to the voice signal of input Reason.The state of each concealed nodes is calculated jointly by the node layer up and down that it is directly connected to and is determined in DBM model, therefore phase Better dimensionality reduction can be carried out to the voice signal of input than DNN-HMM model, capture the feature of different phonetic.Meanwhile it is high-rise The phenomenon that being easy over-fitting when starting to train is avoided using DBN model structure, and there is better recognition accuracy.

The specific training process of the IM-DNN-HMM model is as follows:

1. voice pre-processes

Voice pretreatment, refers to the process by the speech processes of user's typing at IM-DNN-HMM mode input variable.

1) because the voice of user's typing is the non-stationary signal of time-varying, short-term stationarity is converted into using framing adding window Signal is handled, and the signal obtained after processing is α 1.Design parameter setting: 15.23ms is set by frame length, frame shifting is set as frame Long 55%；

2) voice segments that α 1 is extracted using zero crossing end-point detection technology, remove non-speech segment；Obtain α 2；

3) high fdrequency component that α 3 is promoted by preemphasis processing, obtains α 4.

4) the Fbank feature of α 4 is extracted.Fbank feature extraction is to carry out discrete fourier to each frame voice signal of α 4 Transformation, obtains the frequency domain presentation of voice signal；

Obtained frequency domain presentation f is converted to Meier (Mel) frequency of cepstrum domain, formula is as follows:

The triangular shape bandpass filter of 49 equibands is set in Mel spectral range, Mel frequency spectrum is then input to this In 49 triangular shape bandpass filters, the logarithmic energy and each frame of the output of this 49 triangular shape bandpass filters are calculated separately out The energy of civil aviaton's land sky call voice signal constitutes the Fbank feature of one 50 dimension.

The Fbank feature of above-mentioned 50 dimension is by each N frame connection of present frame and its front and back, with the Fbank feature of this 2N+1 frame Input as IM-DNN-HMM model.

2.IM-DNN-HMM model training

Using Qingdao Airport broadcasting station employee's voice (totally 16 people) as standard mandarin voice.Everyone typing voice 10000, and carry out phoneme notation (each pronunciation of mark pronunciation personnel).That chooses everyone recording fixes 5000 voices The training data of (total 16*5000 item) as DNN model, everyone remaining 5000 voices (16*140 item) are used as DNN model Test data.Training data is named as set A, and test data is named as set B.It is each set comprising typing voice and Corresponding phoneme notation.

DBN in IM-DNN-HMM model generates the initial weight of DNN model by unsupervised training first；

Then, by Training, using the connection relationship between each layer, using between desired output and reality output The top-down layer-by-layer transmitting of error, constantly adjust DNN model parameter, obtain the state of hidden layer each unit；

And then the state of input unit is reversely deduced according to the state of hidden layer each unit, complete the parameter of single layer DBN It updates and trains；Gained output state will be calculated as the input data of next DBN, and so on complete the instruction of DNN model Practice.

Each phoneme of set B is mapped in each state of HMM structure, the process that aligned phoneme sequence changes over time is just Constitute HMM state migration procedure；Using set B as desired output, according to the available DNN model of DNN model training method Output；For each voice, finds out the posterior probability of each phonetic feature and be mapped as HMM state.

It is assumed that some voice signal is α t (sj) in the forward direction probability that t moment is in state sj, backward probability is β t (sj), the state transition probability for calculating state occupation probability γ t (sj) and each moment, finds out the posteriority of each phonetic feature Probability is simultaneously mapped as HMM state；

HMM state corresponding with the output of the softmax function of DNN is found, the phoneme under the HMM state is output knot Fruit.

As shown in figure 5, being illustrated using the frame that IM-DNN-HMM model carries out speech recognition.

As depicted in figs. 1 and 2, speech recognition, herein described flight dynamic information are carried out based on IM-DNN-HMM model The process of voice inquiry method is as follows:

Assuming that current time is 2019-07-05 14:26:26, user on the Qingdao airport Liu Ting, voice input " today from Whether the Shandong aviation SC4605 in Qingdao to Shanghai is delayed ".

Step 1, user opens cell phone client and logs in (each user suffers from unique ID), utilizes user's interaction mould Block recalls voice input module, and typing voice；

Step 2, voice sending module sends the voice of typing to the speech reception module of server end；

Step 3, speech recognition module pre-processes voice, obtains the fbank feature of voice messaging；

Speech recognition module finds the user speech model of the user in data bins module；If it is present getting use Family speech model；Otherwise, universal phonetic model is obtained；

Speech recognition module identifies the fbank feature of voice using getting speech model；

Assuming that user's mandarin and nonstandard, the result (being named as result A) of voice module identification be " today from Qingdao to Whether the west Shandong aviation S 10 in Shanghai is delayed "；

Speech recognition module is replaced result A using fuzzy phoneme library, obtains result B；

Specific process is traversing result A, if some word of result A is included in fuzzy phoneme library, is replaced；It will As a result " west " in A, " ten " are substituted for " C ", " 4 ", and as a result A is converted into result B to get to " mountain of the today from Qingdao to Shanghai Whether East Airways sky SC4605 is delayed "；

For the ease of statement, as a result the collection of A and result B are combined into T0.

Partial English alphabet conversion sound is as follows:

Table example as above when recognizing the sound in " user pronunciation (mark) ", is automatically converted to " conversion sound (International Phonetic Symbols mark) ".

Similarly, the Chinese of part, digital fuzzy phoneme are converted into (Chinese phonetic alphabet standard) as shown in the table:

Number	User pronunciation	Convert sound
			4	Shi	si
10	shi、si	si
			3	san、sa	san
2	er、ler	er

Step 4, semantics recognition module segments result A and B, seeks the optimal result C of the two；Then result C is pushed away Give user interactive module；User gives semantics recognition module by again returning to after user interactive module modification；Semantic modules Flight enquiry condition is constituted from middle extraction time, place name, organization names and special noun is returned the result, and is pushed to inquiry mould Block；

Specifically, speech recognition module is according to Chinese grammatical information, to 36 parts of speech of Chinese (including noun, time word, place Institute's word, the noun of locality, number, quantifier etc.) identified, sentence is divided into word, to word mark part of speech, and identify place, when Between etc. information；

As a result A is segmented are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', ' the west S ten 605', ' whether ', ' delay ']；Part-of-speech tagging be [' time word ', ' preposition ', ' place name ', ' preposition ', ' place name ', ' auxiliary word ', ' Organization names ', ' unknown ', ' verb ', ' verb ']；

As a result B is segmented are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', ' SC4605', ' whether ', ' delay ']；Part-of-speech tagging be [' time word ', ' preposition ', ' place name ', ' preposition ', ' place name ', ' help The special noun of word ', ' organization names ', ' ', ' verb ', ' verb ']；

Obtained result C are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', ' SC4605', ' whether ', ' delay ']；

The process for obtaining result C is as follows:

Regard result A and result B as array that length is 10, number is 0~9 from left to right；As a result A is identical with result B The word of number is corresponding word；Similarly, as a result C is equally array that length is 10, and number is 0~9, Mei Gewei from left to right The word set is to be determined by the word of the corresponding position result A and B by specific rule；

If as a result A is as the corresponding word in result B, for example, in result A 0 position word ' today ' and result B In 0 position word ' today ' identical, then the 0 position word of result C be ' today '；

If the corresponding word as a result in A neutralization results B is different, the word assignment therein for having clear part of speech is chosen Give result C；For example, in result A 7 positions word ' S ten 605' of west be " unknown " part of speech, as a result the corresponding position in B ' SC46054' be " special noun ", then in result C 7 positions word be ' SC4605；

As a result C is named as T1；

T1 is pushed to user interactive module by semantics recognition module, and user is allowed to modify or confirm result；

Result (being named as T2) after user's modification or confirmation, returns to semantics recognition module；

Because T1 identification is correct, T2 be still [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong boat Empty ', ' SC4605', ' whether ', ' delay ']；

T2 is pushed to speech recognition module, the factor that speech recognition module inputs T2 as this by semantics recognition module Mark, rebuilds user speech model, and model is stored in data bins module；

While being pushed to speech recognition module, semantics recognition module from T2 extraction time, place name, organization names and Special noun, i.e. " today ", " SC4605 ", " Qingdao ", " Shanghai ", " Shandong aviation ", " SC4605 ", are pushed to enquiry module.

Step 5, the flight enquiry condition that enquiry module is extracted according to semantic module calls data bins module, obtains Dynamic Flight Information；

When the flight enquiry condition includes flight number and flight date, or including city of setting out, reach city and boat When class's date, then the flight enquiry condition is complete.

When lacking flight date, then using current date as flight date；

When lacking flight number or reaching city, or be hard to tell city of setting out with city is reached, then client is called to use Family interactive module obtains the corresponding information of user's input.

It should be understood that for those of ordinary skills, it can be modified or changed according to the above description, And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.

Claims

1. a kind of flight dynamic information voice inquiry method, it is characterised in that: speech recognition is carried out based on IM-DNN-HMM model, Including following below scheme step；

Step 1, user logs in client and typing voice messaging；

Step 2, the voice of typing is sent to server end；

Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtained initial recognition result A and replacement is known Other result B；

Step 4, participle analysis is carried out to obtain optimal result C, from being pushed to the result C again returned to after user to result A and B Middle extraction feature optimizes user speech model to generate flight enquiry condition；

2. flight dynamic information voice inquiry method according to claim 1, it is characterised in that: in the step 3, Existing user speech model if it does not exist, then with universal phonetic model substitution.

3. flight dynamic information voice inquiry method according to claim 2, it is characterised in that: in the step 4, As a result C is pushed to user, is modified to result C by user or is confirmed and again return to；

The result C of return is labeled as query characteristics, is constructed new user speech model or is updated existing user speech Model.

4. flight dynamic information voice inquiry method according to claim 1,2 or 3, it is characterised in that: the IM- DNN-HMM model is the improvement for DNN-HMM model, is mixed by DBM model and DBN model；

Unlike DNN-HMM model, the input layer of IM-DNN-HMM model is the DBM that non-directed graph connects entirely between h1, h2 Model is the DBN model that digraph connects entirely between h2, h3, h4.

5. flight dynamic information voice inquiry method according to claim 4, it is characterised in that: include the following steps IM-DNN-HMM model training process,

1) voice pre-processes

I.e. by the speech processes of user's typing at the process of IM-DNN-HMM mode input variable, with the Fbank feature of 2N+1 frame Input as IM-DNN-HMM model；

2) IM-DNN-HMM model training

Creation standard mandarin sound bank simultaneously carry out phoneme notation, choose a plurality of voice respectively as DNN model training data, And the test data of DNN model, training data are set A, test data is set B；

Then, by Training, the state of hidden layer each unit is obtained；

And then the state of input unit is reversely deduced according to the state of hidden layer each unit, the parameter for completing single layer DBN updates And training；Gained output state will be calculated as the input data of next DBN, and so on complete the training of DNN model；

Each phoneme of set B is mapped in each state of HMM structure, using set B as desired output, finds out each The posterior probability of phonetic feature is simultaneously mapped as HMM state；

HMM state corresponding with the output of the softmax function of DNN is found, the phoneme under the HMM state is output result.