CN110364165A - Flight dynamic information voice inquiry method - Google Patents
Flight dynamic information voice inquiry method Download PDFInfo
- Publication number
- CN110364165A CN110364165A CN201910648358.8A CN201910648358A CN110364165A CN 110364165 A CN110364165 A CN 110364165A CN 201910648358 A CN201910648358 A CN 201910648358A CN 110364165 A CN110364165 A CN 110364165A
- Authority
- CN
- China
- Prior art keywords
- model
- dnn
- result
- flight
- hmm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 5
- 230000006872 improvement Effects 0.000 claims abstract description 4
- 239000010410 layer Substances 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 14
- 241001672694 Citrus reticulata Species 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 239000002356 single layer Substances 0.000 claims description 2
- 238000006467 substitution reaction Methods 0.000 claims 1
- 230000001537 neural effect Effects 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract description 2
- 230000002452 interceptive effect Effects 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000003111 delayed effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000002620 method output Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
Abstract
Flight dynamic information voice inquiry method of the present invention, based on improvement deep neural network-hidden Markov model (IM-DNN-HMM), carry out speech recognition, recognition result is converted into text information to generate querying condition, to realize that complex environment is inquired from the highly efficient and accurate flight dynamic information under different accent input conditions.Including following below scheme step: step 1, user logs in client and typing voice messaging;Step 2, the voice of typing is sent to server end;Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtain initial recognition result A and replacement recognition result B;Step 4, participle analysis is carried out to obtain optimal result C to result A and B, extracts feature from being pushed in the result C again returned to after user to generate flight enquiry condition, and optimize user speech model;Step 5, it is inquired by flight enquiry condition, to obtain dynamic Flight Information.
Description
Technical field
The present invention relates to a kind of novel flight dynamic information voice inquiry methods, that is, are based on IM-DNN-HMM neural network mould
The method that the speech recognition result of type carries out text information inquiry, belongs to the data processing field of civil aviation transportation.
Background technique
With the fast development of domestic Civil Aviation Industry, airport scale is increasing, flight quantity is more and more, therefore for mentioning
The requirement for rising Airport Operation efficiency and service quality is also higher and higher.
Flight dynamic information can be conveniently and efficiently inquired, reasonably planning and the distribution existing money in airport are not only improved
Source, while can also relatively improve in-mind anticipation and the service impression of consumer.
Currently, the information query method based on speech recognition is disclosed, such as following first published application, application No. is
CN201410262964.3 applies for entitled Scheduled Flight intelligent inquiry system and method based on speech recognition, system packet
Cell phone client and server end are included, the client includes client speech recognition analysis module, client user's interaction mould
Block;The server end includes server end Scheduled Flight data service module.Text envelope is obtained by carrying out identification to voice
Breath, to carry out flight inquiring based on text information.But the document does not absolutely prove how text is generated from speech recognition result
How this information improves the accuracy and efficiency of speech recognition, and how to form the text index result of matching inquiry condition.
Existing relatively conventional voice recognition acoustic model, including gauss hybrid models-hidden Markov model
(Gaussian Mixture Model-Hidden Markoy Model, abbreviation GMM-HMM), deep neural network (Deep
Neural Network, abbreviation DNN) and the mixed model of above-mentioned model etc..Mixed model has become the master of speech recognition
Acoustic model is led, parameter adjustment can be carried out in feature space and the model space.Such as from reduction noise point, mixed model
Front end using DNN carry out Feature Mapping reduce noise, rear end use deep neural network-hidden Markov model (DNN-
HMM it) is used as acoustic model, the DNN that front end features map is combined with the DNN adjusting parameter process of rear end acoustic model, with
Obtain the promotion of speech recognition result precision.
But in real speech polling, even if carrying out speech recognition using above-mentioned mixed model, recognition result still can not
Text information querying condition is directly generated, in most cases passenger and consumer are still to be manually entered based on text information inquiry
It is main.For limbs impaired, existing so-called voice inquiry method still is apparent not enough.And due to airport environment more noise
Miscellaneous and accent not parity problem, causes current speech recognition efficiency lower, often also slower than being manually entered inquiry.
In view of this, special propose present patent application.
Summary of the invention
Flight dynamic information voice inquiry method of the present invention is to solve the above-mentioned problems of the prior art and base
In improving deep neural network-hidden Markov model (IM-DNN-HMM), speech recognition is carried out, recognition result is converted into text
This information is to generate querying condition, to realize complex environment and the highly efficient and accurate flight under different accent input conditions
Multidate information inquiry.
The flight dynamic information voice inquiry method, be based on IM-DNN-HMM model carry out speech recognition, including
Following below scheme step:
Step 1, user logs in client and typing voice messaging;
Step 2, the voice of typing is sent to server end;
Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtains initial recognition result A and replace
Change recognition result B;
Step 4, participle analysis is carried out to obtain optimal result C to result A and B, is again returned to after being pushed to user
As a result feature is extracted in C to generate flight enquiry condition, and optimizes user speech model;
Step 5, it is inquired by flight enquiry condition, to obtain dynamic Flight Information.
As described above, the application flight dynamic information voice inquiry method, can learn the voice of customer, to defeated
The voice entered is analyzed to obtain the text information comprising querying condition, to carry out flight inquiring based on querying condition, improves
The efficiency and convenience of inquiry.In addition, carrying out depth for the pronunciation characteristic of user by the continuous study to user speech
The service of customization is realized in optimization.
Further, in above-mentioned steps 3, existing user speech model, then replaced if it does not exist with universal phonetic model
Generation.
In above-mentioned step 4, as a result C is pushed to user, is modified to result C by user or is confirmed and return again
It returns;The result C of return is labeled as query characteristics, is constructed new user speech model or is updated existing user speech mould
Type.
The IM-DNN-HMM model is the improvement for DNN-HMM model, by DBM model and DBN model mixing
At;Unlike DNN-HMM model, the input layer of IM-DNN-HMM model is the DBM that non-directed graph connects entirely between h1, h2
Model is the DBN model that digraph connects entirely between h2, h3, h4.
To sum up content, the flight dynamic information voice inquiry method have the advantage that
1, providing a whole set of includes data receiver, processing, model training, modelling effect, the complete flight of model encapsulation
The prediction scheme of time is landed, application prospect is wider;
2, data class demand is few, and forecast cost is low.History ADS-B data and the history landing time of flight are only needed,
Prediction only needs real-time ADS-B data, this for the Civil Aviation Industry extremely sensitive to data for being of great significance;
3, forecasting accuracy is high, big to the help of airfield support work;The precision of prediction of half an hour can achieve mistake in advance
Within poor 4.5 minutes, and as flight is close to landing, precision of prediction is stepped up.The lead of half an hour is for airport
Ground, which takes safeguard work also, enough meanings.
Detailed description of the invention
Fig. 1 is the flow diagram of herein described flight dynamic information voice inquiry method;
Fig. 2 is the flow diagram that flight enquiry condition generates;
Fig. 3 is the architecture diagram using the information query system of the application;
Fig. 4 is the Structure Comparison figure of DNN-HMM Yu two kinds of models of IM-DNN-HMM;
Fig. 5 is the frame diagram that speech recognition is carried out using IM-DNN-HMM model.
Specific embodiment
The present invention is further described with reference to the accompanying drawings and examples.
Embodiment 1 is looked into using the information of herein described flight dynamic information voice inquiry method as shown in Figure 1 to Figure 3
Inquiry system is made of client and server end.
The client includes voice input module, voice sending module, user interactive module.
The server end include speech reception module, speech recognition module, semantics recognition module, enquiry module and
Data bins module.
In client,
The voice input module, for calling user mobile phone microphone with typing user speech information;The language
Sound sending module, for the user speech information of voice input module typing to be sent to the speech reception module of server end;
The user interactive module, the flight enquiry condition project for showing speech recognition result, lacking, and obtain
User inputs corresponding flight enquiry condition and reception and shows flight inquiring result.
In server end,
The speech reception module, for receiving the data of client voice sending module transmission;
The speech recognition module, user speech, converts text (being named as T0) for voice for identification;
Speech recognition module carries out deep learning to the voice that user inputs based on IM-DNN-HMM model, is continuously improved
Audio identification efficiency;Speech recognition module searches for the speech model information of user by data bins, can also pass through the mould of data bins
Paste sound database is modified speech recognition result;
The semantics recognition module based on pkuseg participle packet, divides the text T0 of speech recognition module output
Analysis, and then correct and form new text, the new text after correction is named as T1;
Speech recognition module sends T1 to the user interactive module of client;If inerrancy, user is returned;
If wrong, user can be modified and be returned to identification recognition result by user interactive module;Text after return
It is named as T2;After user returns, T2 is sent to semantics recognition module;
Semantics recognition module extracts flight enquiry condition according to T2;
The enquiry module, the flight enquiry condition for being extracted according to speech recognition module call data bins module,
Obtain dynamic Flight Information;When the flight enquiry condition includes flight number and flight date, or including city of setting out, reach
When city and flight date, then the flight enquiry condition is complete;
When flight enquiry condition is imperfect, the flight enquiry condition lacked is obtained by enquiry module, comprising:
When lacking flight date, then using current date as flight date;
When lack set out city when, then calling mobile phone GPS acquisition is currently located city as setting out city;
When lacking flight number or reaching city, or be hard to tell city of setting out with city is reached, then client is called to use
Family interactive module obtains the corresponding information of user's input;
The data bins module includes Scheduled Flight database, Airport information database, registration User Information Database
With voice corpus data library.
Wherein, Scheduled Flight database includes real-time dynamic flight data both at home and abroad;The Airport information database includes
The relevant information on domestic and international airport;
Voice corpus data library includes common fuzzy phoneme library, universal phonetic model;Wherein, universal phonetic model is to utilize
The IM-DNN-HMM model of standard mandarin training;
Register User Information Database, registration information, user speech model including user;Wherein, user speech model
It is on the basis of universal model, according to the IM-DNN-HMM model of user speech training.
The flight dynamic information voice inquiry method is based on improved deep neural network-hidden Markov mould
Type (abbreviation IM-DNN-HMM model) carries out speech recognition.
As shown in Figure 4.(a) it is partially DNN-HMM model, is partially (b) IM-DNN-HMM model.
The IM-DNN-HMM model is the improvement for DNN-HMM model.
The deep neural network that DNN-HMM model is made of DBN model, DBN model hidden layer are had using RBM composition
To graph model.And IM-DNN-HMM model is mixed by DBM model and DBN model.
Specifically, the sampled value of the undirected graph model that DBM model is made of two layers of RBM, every node layer is connected by two layers
The node connect calculates jointly.DBM model training time span is related with its number of plies and every layer of number of nodes.
The Directed Graph Model that DBN model is made of four layers of RBM, during pre-training, upper layer is output, and lower layer is defeated
Enter.After all layers of training, supervision fine tuning has been carried out downwards by top layer.
In Fig. 4, DNN-HMM model and IM-DNN-HMM model have 1 input layer, 4 hidden layers and 1 output
Layer.H1, h2, h3, h4 respectively correspond 4 hidden layers, and W1, W2, W3, W4, W5 respectively correspond the connection weight of interlayer.Model is identical
Node layer is not connected to, and is all connected between different node layers.
The difference is that the input layer of DNN-HMM model, being the DBN mould that digraph connects entirely between h1, h2, h3, h4
Type.And the input layer of IM-DNN-HMM model, be the DBM model that non-directed graph connects entirely between h1, h2, it is to have between h2, h3, h4
The DBN model connected entirely to figure.
IM-DNN-HMM model is using the vector of regular length as mode input, and first by h1, h2 training, h2 is as DBM
The output layer of model, while being also the input of h3, h4, output is the character representation of current input information.
Compared with DNN-HMM model, IM-DNN-HMM model has used DBM model to carry out place to the voice signal of input
Reason.The state of each concealed nodes is calculated jointly by the node layer up and down that it is directly connected to and is determined in DBM model, therefore phase
Better dimensionality reduction can be carried out to the voice signal of input than DNN-HMM model, capture the feature of different phonetic.Meanwhile it is high-rise
The phenomenon that being easy over-fitting when starting to train is avoided using DBN model structure, and there is better recognition accuracy.
The specific training process of the IM-DNN-HMM model is as follows:
1. voice pre-processes
Voice pretreatment, refers to the process by the speech processes of user's typing at IM-DNN-HMM mode input variable.
1) because the voice of user's typing is the non-stationary signal of time-varying, short-term stationarity is converted into using framing adding window
Signal is handled, and the signal obtained after processing is α 1.Design parameter setting: 15.23ms is set by frame length, frame shifting is set as frame
Long 55%;
2) voice segments that α 1 is extracted using zero crossing end-point detection technology, remove non-speech segment;Obtain α 2;
3) high fdrequency component that α 3 is promoted by preemphasis processing, obtains α 4.
4) the Fbank feature of α 4 is extracted.Fbank feature extraction is to carry out discrete fourier to each frame voice signal of α 4
Transformation, obtains the frequency domain presentation of voice signal;
Obtained frequency domain presentation f is converted to Meier (Mel) frequency of cepstrum domain, formula is as follows:
The triangular shape bandpass filter of 49 equibands is set in Mel spectral range, Mel frequency spectrum is then input to this
In 49 triangular shape bandpass filters, the logarithmic energy and each frame of the output of this 49 triangular shape bandpass filters are calculated separately out
The energy of civil aviaton's land sky call voice signal constitutes the Fbank feature of one 50 dimension.
The Fbank feature of above-mentioned 50 dimension is by each N frame connection of present frame and its front and back, with the Fbank feature of this 2N+1 frame
Input as IM-DNN-HMM model.
2.IM-DNN-HMM model training
Using Qingdao Airport broadcasting station employee's voice (totally 16 people) as standard mandarin voice.Everyone typing voice
10000, and carry out phoneme notation (each pronunciation of mark pronunciation personnel).That chooses everyone recording fixes 5000 voices
The training data of (total 16*5000 item) as DNN model, everyone remaining 5000 voices (16*140 item) are used as DNN model
Test data.Training data is named as set A, and test data is named as set B.It is each set comprising typing voice and
Corresponding phoneme notation.
DBN in IM-DNN-HMM model generates the initial weight of DNN model by unsupervised training first;
Then, by Training, using the connection relationship between each layer, using between desired output and reality output
The top-down layer-by-layer transmitting of error, constantly adjust DNN model parameter, obtain the state of hidden layer each unit;
And then the state of input unit is reversely deduced according to the state of hidden layer each unit, complete the parameter of single layer DBN
It updates and trains;Gained output state will be calculated as the input data of next DBN, and so on complete the instruction of DNN model
Practice.
Each phoneme of set B is mapped in each state of HMM structure, the process that aligned phoneme sequence changes over time is just
Constitute HMM state migration procedure;Using set B as desired output, according to the available DNN model of DNN model training method
Output;For each voice, finds out the posterior probability of each phonetic feature and be mapped as HMM state.
It is assumed that some voice signal is α t (sj) in the forward direction probability that t moment is in state sj, backward probability is β t
(sj), the state transition probability for calculating state occupation probability γ t (sj) and each moment, finds out the posteriority of each phonetic feature
Probability is simultaneously mapped as HMM state;
HMM state corresponding with the output of the softmax function of DNN is found, the phoneme under the HMM state is output knot
Fruit.
As shown in figure 5, being illustrated using the frame that IM-DNN-HMM model carries out speech recognition.
As depicted in figs. 1 and 2, speech recognition, herein described flight dynamic information are carried out based on IM-DNN-HMM model
The process of voice inquiry method is as follows:
Assuming that current time is 2019-07-05 14:26:26, user on the Qingdao airport Liu Ting, voice input " today from
Whether the Shandong aviation SC4605 in Qingdao to Shanghai is delayed ".
Step 1, user opens cell phone client and logs in (each user suffers from unique ID), utilizes user's interaction mould
Block recalls voice input module, and typing voice;
Step 2, voice sending module sends the voice of typing to the speech reception module of server end;
Step 3, speech recognition module pre-processes voice, obtains the fbank feature of voice messaging;
Speech recognition module finds the user speech model of the user in data bins module;If it is present getting use
Family speech model;Otherwise, universal phonetic model is obtained;
Speech recognition module identifies the fbank feature of voice using getting speech model;
Assuming that user's mandarin and nonstandard, the result (being named as result A) of voice module identification be " today from Qingdao to
Whether the west Shandong aviation S 10 in Shanghai is delayed ";
Speech recognition module is replaced result A using fuzzy phoneme library, obtains result B;
Specific process is traversing result A, if some word of result A is included in fuzzy phoneme library, is replaced;It will
As a result " west " in A, " ten " are substituted for " C ", " 4 ", and as a result A is converted into result B to get to " mountain of the today from Qingdao to Shanghai
Whether East Airways sky SC4605 is delayed ";
For the ease of statement, as a result the collection of A and result B are combined into T0.
Partial English alphabet conversion sound is as follows:
Table example as above when recognizing the sound in " user pronunciation (mark) ", is automatically converted to " conversion sound
(International Phonetic Symbols mark) ".
Similarly, the Chinese of part, digital fuzzy phoneme are converted into (Chinese phonetic alphabet standard) as shown in the table:
Number | User pronunciation | Convert sound |
4 | Shi | si |
10 | shi、si | si |
3 | san、sa | san |
2 | er、ler | er |
Step 4, semantics recognition module segments result A and B, seeks the optimal result C of the two;Then result C is pushed away
Give user interactive module;User gives semantics recognition module by again returning to after user interactive module modification;Semantic modules
Flight enquiry condition is constituted from middle extraction time, place name, organization names and special noun is returned the result, and is pushed to inquiry mould
Block;
Specifically, speech recognition module is according to Chinese grammatical information, to 36 parts of speech of Chinese (including noun, time word, place
Institute's word, the noun of locality, number, quantifier etc.) identified, sentence is divided into word, to word mark part of speech, and identify place, when
Between etc. information;
As a result A is segmented are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', ' the west S ten
605', ' whether ', ' delay '];Part-of-speech tagging be [' time word ', ' preposition ', ' place name ', ' preposition ', ' place name ', ' auxiliary word ', '
Organization names ', ' unknown ', ' verb ', ' verb '];
As a result B is segmented are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', '
SC4605', ' whether ', ' delay '];Part-of-speech tagging be [' time word ', ' preposition ', ' place name ', ' preposition ', ' place name ', ' help
The special noun of word ', ' organization names ', ' ', ' verb ', ' verb '];
Obtained result C are as follows: [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong aviation ', '
SC4605', ' whether ', ' delay '];
The process for obtaining result C is as follows:
Regard result A and result B as array that length is 10, number is 0~9 from left to right;As a result A is identical with result B
The word of number is corresponding word;Similarly, as a result C is equally array that length is 10, and number is 0~9, Mei Gewei from left to right
The word set is to be determined by the word of the corresponding position result A and B by specific rule;
If as a result A is as the corresponding word in result B, for example, in result A 0 position word ' today ' and result B
In 0 position word ' today ' identical, then the 0 position word of result C be ' today ';
If the corresponding word as a result in A neutralization results B is different, the word assignment therein for having clear part of speech is chosen
Give result C;For example, in result A 7 positions word ' S ten 605' of west be " unknown " part of speech, as a result the corresponding position in B '
SC46054' be " special noun ", then in result C 7 positions word be ' SC4605;
As a result C is named as T1;
T1 is pushed to user interactive module by semantics recognition module, and user is allowed to modify or confirm result;
Result (being named as T2) after user's modification or confirmation, returns to semantics recognition module;
Because T1 identification is correct, T2 be still [' today ', ' from ', ' Qingdao ', ' to ', ' Shanghai ', ' ', ' Shandong boat
Empty ', ' SC4605', ' whether ', ' delay '];
T2 is pushed to speech recognition module, the factor that speech recognition module inputs T2 as this by semantics recognition module
Mark, rebuilds user speech model, and model is stored in data bins module;
While being pushed to speech recognition module, semantics recognition module from T2 extraction time, place name, organization names and
Special noun, i.e. " today ", " SC4605 ", " Qingdao ", " Shanghai ", " Shandong aviation ", " SC4605 ", are pushed to enquiry module.
Step 5, the flight enquiry condition that enquiry module is extracted according to semantic module calls data bins module, obtains
Dynamic Flight Information;
When the flight enquiry condition includes flight number and flight date, or including city of setting out, reach city and boat
When class's date, then the flight enquiry condition is complete.
When flight enquiry condition is imperfect, the flight enquiry condition lacked is obtained by enquiry module, comprising:
When lacking flight date, then using current date as flight date;
When lack set out city when, then calling mobile phone GPS acquisition is currently located city as setting out city;
When lacking flight number or reaching city, or be hard to tell city of setting out with city is reached, then client is called to use
Family interactive module obtains the corresponding information of user's input.
It should be understood that for those of ordinary skills, it can be modified or changed according to the above description,
And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.
Claims (5)
1. a kind of flight dynamic information voice inquiry method, it is characterised in that: speech recognition is carried out based on IM-DNN-HMM model,
Including following below scheme step;
Step 1, user logs in client and typing voice messaging;
Step 2, the voice of typing is sent to server end;
Step 3, voice is pre-processed, obtains the fbank feature of voice messaging, obtained initial recognition result A and replacement is known
Other result B;
Step 4, participle analysis is carried out to obtain optimal result C, from being pushed to the result C again returned to after user to result A and B
Middle extraction feature optimizes user speech model to generate flight enquiry condition;
Step 5, it is inquired by flight enquiry condition, to obtain dynamic Flight Information.
2. flight dynamic information voice inquiry method according to claim 1, it is characterised in that: in the step 3,
Existing user speech model if it does not exist, then with universal phonetic model substitution.
3. flight dynamic information voice inquiry method according to claim 2, it is characterised in that: in the step 4,
As a result C is pushed to user, is modified to result C by user or is confirmed and again return to;
The result C of return is labeled as query characteristics, is constructed new user speech model or is updated existing user speech
Model.
4. flight dynamic information voice inquiry method according to claim 1,2 or 3, it is characterised in that: the IM-
DNN-HMM model is the improvement for DNN-HMM model, is mixed by DBM model and DBN model;
Unlike DNN-HMM model, the input layer of IM-DNN-HMM model is the DBM that non-directed graph connects entirely between h1, h2
Model is the DBN model that digraph connects entirely between h2, h3, h4.
5. flight dynamic information voice inquiry method according to claim 4, it is characterised in that: include the following steps
IM-DNN-HMM model training process,
1) voice pre-processes
I.e. by the speech processes of user's typing at the process of IM-DNN-HMM mode input variable, with the Fbank feature of 2N+1 frame
Input as IM-DNN-HMM model;
2) IM-DNN-HMM model training
Creation standard mandarin sound bank simultaneously carry out phoneme notation, choose a plurality of voice respectively as DNN model training data,
And the test data of DNN model, training data are set A, test data is set B;
DBN in IM-DNN-HMM model generates the initial weight of DNN model by unsupervised training first;
Then, by Training, the state of hidden layer each unit is obtained;
And then the state of input unit is reversely deduced according to the state of hidden layer each unit, the parameter for completing single layer DBN updates
And training;Gained output state will be calculated as the input data of next DBN, and so on complete the training of DNN model;
Each phoneme of set B is mapped in each state of HMM structure, using set B as desired output, finds out each
The posterior probability of phonetic feature is simultaneously mapped as HMM state;
HMM state corresponding with the output of the softmax function of DNN is found, the phoneme under the HMM state is output result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910648358.8A CN110364165A (en) | 2019-07-18 | 2019-07-18 | Flight dynamic information voice inquiry method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910648358.8A CN110364165A (en) | 2019-07-18 | 2019-07-18 | Flight dynamic information voice inquiry method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110364165A true CN110364165A (en) | 2019-10-22 |
Family
ID=68220264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910648358.8A Pending CN110364165A (en) | 2019-07-18 | 2019-07-18 | Flight dynamic information voice inquiry method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110364165A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112466298A (en) * | 2020-11-24 | 2021-03-09 | 网易(杭州)网络有限公司 | Voice detection method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1514387A (en) * | 2002-12-31 | 2004-07-21 | 中国科学院计算技术研究所 | Sound distinguishing method in speech sound inquiry |
CN101334999A (en) * | 2008-07-10 | 2008-12-31 | 上海言海网络信息技术有限公司 | Chinese speech recognizing system and method thereof |
CN102074231A (en) * | 2010-12-30 | 2011-05-25 | 万音达有限公司 | Voice recognition method and system |
CN108777142A (en) * | 2018-06-05 | 2018-11-09 | 上海木木机器人技术有限公司 | A kind of interactive voice recognition methods and interactive voice robot based on airport environment |
-
2019
- 2019-07-18 CN CN201910648358.8A patent/CN110364165A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1514387A (en) * | 2002-12-31 | 2004-07-21 | 中国科学院计算技术研究所 | Sound distinguishing method in speech sound inquiry |
CN101334999A (en) * | 2008-07-10 | 2008-12-31 | 上海言海网络信息技术有限公司 | Chinese speech recognizing system and method thereof |
CN102074231A (en) * | 2010-12-30 | 2011-05-25 | 万音达有限公司 | Voice recognition method and system |
CN108777142A (en) * | 2018-06-05 | 2018-11-09 | 上海木木机器人技术有限公司 | A kind of interactive voice recognition methods and interactive voice robot based on airport environment |
Non-Patent Citations (1)
Title |
---|
李云红等: "一种改进的DNN-HMM的语音识别方法", 《应用声学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112466298A (en) * | 2020-11-24 | 2021-03-09 | 网易(杭州)网络有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN112466298B (en) * | 2020-11-24 | 2023-08-11 | 杭州网易智企科技有限公司 | Voice detection method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10410627B2 (en) | Automatic language model update | |
CN111739508B (en) | End-to-end speech synthesis method and system based on DNN-HMM bimodal alignment network | |
CN101447185B (en) | Audio frequency rapid classification method based on content | |
CN109766355A (en) | A kind of data query method and system for supporting natural language | |
CN108304372A (en) | Entity extraction method and apparatus, computer equipment and storage medium | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
CN108287858A (en) | The semantic extracting method and device of natural language | |
CN107665706A (en) | Rapid Speech exchange method and system | |
CN110335609A (en) | A kind of air-ground communicating data analysis method and system based on speech recognition | |
CN112581964B (en) | Multi-domain oriented intelligent voice interaction method | |
CN109243460A (en) | A method of automatically generating news or interrogation record based on the local dialect | |
CN112133290A (en) | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field | |
CN111489746A (en) | Power grid dispatching voice recognition language model construction method based on BERT | |
Šmídl et al. | Semi-supervised training of DNN-based acoustic model for ATC speech recognition | |
Kocour et al. | Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition. | |
CN110364165A (en) | Flight dynamic information voice inquiry method | |
CN109325243A (en) | Mongolian word cutting method and its word cutting system of the character level based on series model | |
Zhang et al. | Research on spectrum sensing system based on composite neural network | |
CN110809796B (en) | Speech recognition system and method with decoupled wake phrases | |
CN107368473B (en) | Method for realizing voice interaction | |
Jiang et al. | A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control. | |
Zhang et al. | Speech Quality Evaluation of Air Traffic Control Based on Phoneme Level | |
CN114512131A (en) | Airborne air traffic control instruction intelligent voice recognition method and system | |
Sertsi et al. | Hybrid input-type recurrent neural network language modeling for end-to-end speech recognition | |
CN118013390B (en) | Intelligent workbench control method and system based on big data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |
|
RJ01 | Rejection of invention patent application after publication |