CN110069771A - A kind of control order information processing method based on semantic chunking - Google Patents

A kind of control order information processing method based on semantic chunking Download PDF

Info

Publication number
CN110069771A
CN110069771A CN201910180560.2A CN201910180560A CN110069771A CN 110069771 A CN110069771 A CN 110069771A CN 201910180560 A CN201910180560 A CN 201910180560A CN 110069771 A CN110069771 A CN 110069771A
Authority
CN
China
Prior art keywords
word
attribute
control order
sequence
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910180560.2A
Other languages
Chinese (zh)
Other versions
CN110069771B (en
Inventor
王煊
徐秋程
丁辉
王冠
严勇杰
陈平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201910180560.2A priority Critical patent/CN110069771B/en
Publication of CN110069771A publication Critical patent/CN110069771A/en
Application granted granted Critical
Publication of CN110069771B publication Critical patent/CN110069771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a kind of control order information processing methods based on semantic chunking, its purpose is that: 1, computer-readable structuring control order is constructed, provides basis for the automatic processing of control order.2, convenient for the information extraction of control order and semantic analysis, result precision is improved.By the identification and processing to control term phrase, following miscellaneous function is may be implemented in this method: the information carried to the basic control term occurred in control order is effectively extracted;The information such as model, state to target aircraft are accurately extracted;Data are provided for the converging information based on control order.Situations such as this method includes special control term, is digital for control order extracts it by using frame search method and the corresponding rule of design.The present invention improves the information extraction ability to control order, while also improving the precision of semantic analysis result.

Description

A kind of control order information processing method based on semantic chunking
Technical field
The invention belongs to Air Traffic Control Automation System technical field more particularly to a kind of pipes based on semantic chunking Command information processing method processed.
Background technique
With flourishing for nearly 30 years Chinese Civil Aviations, the demand of air traffic control constantly expands, and causes safety hidden Suffer from problem to become increasingly conspicuous.Data is shown according to statistics, and in the flight safety accident that the past occurs, human factor accounting is more than 75%, and wherein because controller make mistakes due to caused by accident account for 25%.Conflict caused by due to controller's fault is solved at present Main stream approach is to reinforce the monitoring device of scene, by anti-by equipment such as scene surveillance radar, multipoint location system sensors Only conflict occurs.Meanwhile some more advanced solutions based on artificial intelligence are also suggested, and such as utilize speech recognition technology Control voice is carried out to be recognized and converted into text formatting, carries out semantic analysis using natural language processing technique.
There are some special control terms in control order, these control terms do not meet nature syntax rule, simultaneously Also phrase (language block) can be formed with adjacent word.In general natural language processing, the extracting method of entity or language block is made With name Entity recognition, however name entity recognition techniques only name, place name, mechanism name etc. are identified, can not be to pipe Phrase (language block) centered on term processed carries out identification extraction, also can not be to some English letter sum numbers occurred in control order Combinatorics on words carries out Entity recognition.
Semantic component theory takes the lead in proposing by Steve Abney, is a kind of method for shallow parsing.English group Block is defined as: sentence is made of some chunkings, and each chunking is made of the relevant word of syntax, have it is non-overlapping, without embedding Set, disjoint characteristic.This method is used for the extraction to special control term phrase with following feasibility: 1, blank pipe field It is closed domain, the limited amount of special control term can design limited quantity chunking rule;2, it is logical to meet land sky for control order Words rule, the use of control term meet certain rule, are used directly for the formulation of rule;3, special control term phrase Role in general control instruction is relatively independent, usually only expression external environmental information, seldom with other in control order Word is associated with, and meets the definition of sentence chunking.
Summary of the invention
Goal of the invention: group of the present invention from the angle of semantic chunking to the primary word group information occurred in practical control order It is analyzed at form, and combines sky call rule in land to design corresponding chunking rule and carry out phrase extraction.In the present invention, with Special control term, number, English alphabet sequence are point of penetration, and are identified and extracted according to its compositing characteristic.
Technical solution: the present invention provides a kind of control order information processing method based on semantic chunking, including it is as follows Step:
Step 1, Chinese word segmentation operation is carried out to control order, obtains word sequence;
Step 2, part-of-speech tagging is carried out to each word in word sequence, the feature as target word;
Step 3, the control order comprising special control term is handled;
Step 4, the other compositions of control order are handled;
Step 5, according to the processing result of step 3 and step 4, control order is analyzed, is completed to air traffic pipe The semantic understanding of control order in system processed, the information that obtained result is used to judge in control order whether in system Plan information is consistent.
In step 1, control order is carried out using segmentation methods (such as method based on dictionary and Hidden Markov Model) Chinese word segmentation operation, obtains word sequence, and during participle, edits control glossary, adds in dictionary some normal Basic control term (according to the collection to control order, obtaining such as: surface wind, dew point, visibility, control tower noun). The control glossary is added in segmentation methods, auxiliary carries out word segmentation processing to control order.
In step 2, for the number in control order, part-of-speech tagging is m;For the part-of-speech tagging of English alphabet sequence It is nx or eng;Part-of-speech tagging for special control term is Sp.
In step 2, when carrying out part-of-speech tagging to special control term, specifically it is arranged as follows:
Indicate that word does not form phrase to the special control term with its front and back with Sp0;
Indicate that the word of the special control term and the front forms phrase with Sp1;
Indicate that the special control term forms phrase with word behind with Sp2;
Indicate that the special control term can form phrase with the front or subsequent word with Sp3.
Step 3 includes:
Step 3-1, when occurring special control term in control order, using the control term as center word, according to special The corresponding search rule of mark generates language block frame, and (special mark is artificially to be marked to control term in advance, and be arranged Search rule, search rule provide in step 3-2), language block frame includes two word slots:
First word slot is special control term, and second word slot is search content, alternatively, first word slot is language block category Property (such as: the time), second word slot is the corresponding content of attribute occurred in control order;
Step 3-2 designs corresponding search rule (table 1) in conjunction with the usage of special control term:
The special control term that part-of-speech tagging is Sp0 is directly identified without any search, forms phrase;
For part-of-speech tagging be Sp1 special control term, scan for the left, according to defined search rule (see 1 search rule of table) search corresponding boundary;
The special control term for being Sp2 for part-of-speech tagging, scans for the right, is searched according to the search rule that table 1 defines Rope is to corresponding boundary;
Table 1
The special control term for being Sp3 for part-of-speech tagging first scans for, to the left according to the search rule in rule base Search corresponding boundary;If scanning for without content, then to the right, corresponding side is searched according to the search rule in rule base Boundary;
The content that special control term and search obtain is carried out the frame that step 3-1 is provided by step 3-3 respectively Filling, such as: 3 meters of seconds of surface wind, surface wind are special control terms, and 3 meters of seconds are search content, the form of frame herein Are as follows: first word slot is special control term, and second word slot is search content.
Step 4 includes:
Step 4-1 judges whether occurring digital or English alphabet sequence in control order, if there is not number Or English alphabet, terminate process;Number or English alphabet if it exists, then continue step 4-2;
Step 4-2 analyzes the number or English alphabet sequence that occur in control order, its structure is divided into three kinds Situation:
The front and back of the first situation, number or English alphabetic combination carries the word or word that can carry out determined property;
Second situation, number or English alphabetic combination have internal special construction, converse in conjunction with control land sky regular It is able to carry out judgement;
The outside of the third situation, number or English alphabet sequence does not obviously indicate the word of its attribute, inside The special construction of attribute can not be distinguished;
Step 4-3, for the first case and second situation, the relevant rule of design carry out the judgement of attribute;
Step 4-4 carries out the judgement of attribute using the method based on Hidden Markov Model for the third situation;
Step 4-5 carries out the word slot in language block frame if having found out attribute by step 4-3 and step 4-4 Filling, the form of language block frame are as follows: first word slot is attribute, and second word slot is number or English alphabet.
Step 4-3 includes: the judgement that the following rule of design carries out attribute:
If inside the number or English alphabetic combination including decimal point, attribute is civil aviation control frequency;
If the number or English alphabetic combination are the combinations of letter and number, attribute is flight number;
If it is rice that the number or English alphabetic combination, which are followed by unit, attribute is height;
If it is foot that the number or English alphabetic combination, which are followed by unit, attribute is height;
If it is the rice second that the number or English alphabetic combination, which are followed by unit, attribute is speed;
If it is kilometer that the number or English alphabetic combination, which are followed by unit, attribute is distance;
If the number or English alphabetic combination are followed by unit degree of being, attribute is to turn to;
If it is point that attribute is time point that the number or English alphabetic combination, which are followed by unit,;
If it is minute that the number or English alphabetic combination, which are followed by unit, attribute is time span.
Step 4-4 includes:
Hidden Markov Model formal definition is as follows:
Q is all possible state set, and the word other than target number and English alphabet sequence corresponds to part of speech, number of targets Word and English alphabet sequence are corresponding, are its (" its " here refers to target number and English alphabet sequence) attributes;V is all Possible observation set, the i.e. corresponding output word of part of speech;I is status switch;O is observation sequence, in which:
Q={ q1,q2,...,qN, V={ v1,v2,...,vM,
I={ i1,i2,...,iK, O={ o1,o2,...,oM,
Wherein, N is possible status number, qNIndicate the possible state of n-th;M is possible observation number, vMIndicate m-th Possible observation number, oMIndicate m-th actual observation;K is actual status number, iKIndicate the actual state value of k-th;A is shape State transition probability matrix: A=[aij]N×N, B is observation probability matrix: B=[bj(K)]N×M, π is initial state probability vector: π =(πi), in which:
aij=P (it+1=qj|it=qi), i=1,2 ..., N;J=1,2 ..., N, aijIndicate the shape from current time State qiGenerate the state q of subsequent timejProbability, be denoted as P (it+1=qj|it=qi);
bj(k)=P (ot=vk|it=qj), k=1,2 ..., M;bj(k) the state q by current time is indicatediGeneration is worked as The output valve v at preceding momentkProbability, be denoted as P (ot=vk|it=qj);
πi=P (i1=qi), πiThe generating probability for indicating the different conditions of initial time, is denoted as P (i1=qi),;
The problem of being described according to the third situation, in order to determine the number or English words that individually occur in control order The case where auxiliary sequence, takes n word before and after target number or English alphabet sequence, and formation sequence, then problem is converted into sequence The problem of mark, wherein observation sequence it is known that and other than target word word part of speech it is also known that, solve target number or English The attribute question of letter is converted into the probability calculation problem of Hidden Markov Model (Hidden Markov Model, HMM), generally The maximum state of rate is the attribute of target, and the determination of the parameter of Hidden Markov Model is then converted into problem concerning study;
The method for solving of probability calculation problem is as follows: setting models λ=(π, A, B) and observation sequence O, then target number or Person's English alphabet sequence belongs to state i.e. attribute q in position tiProbability γt(i) are as follows:
γt(i)=P (it=qi| O, λ),
Wherein, 0 < t≤T, T indicate the position of the last one element of sequence;
It is obtained by forward, backward probability calculation:
Wherein, αt(i) it is the forward direction probability by preceding i-th of the state obtained to probability calculation:
αt(i)=P (o1,o2,...,ot,it=qi| λ),
βt(i) it is the backward probability for i-th of state being calculated by backward probability:
βt(i)=P (ot+1,ot+2,...,OT|it=qi, λ),
The backward probability β of boundary condition, that is, final position different conditionsT(i) are as follows: βT(i)=1.
The problem concerning study of HMM is then solved by using control order corpus and Baum-Welch algorithm, can be estimated Corresponding parameter out.
The utility model has the advantages that the present invention has the following technical effect that:
(1) digital with special control term in automatic identification control order, the phrase based on English alphabet.
(2) converging information to control order may be implemented.
(3) performance of control order semantic analysis is improved.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.
Fig. 1 is the flow chart of the control order information extraction method based on semantic chunking.
Fig. 2 is the broad flow diagram of the control order information extraction method based on semantic chunking.
Fig. 3 is Hidden Markov Model structure chart.
Fig. 4 is the result completed after part of speech analysis.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Realization process of the invention and steps are as follows, flow chart is as shown in Figure 1.
Step 1: Chinese word segmentation
Chinese word segmentation operation is carried out to control order using segmentation methods, obtains word sequence.During participle, by Basic control term is added in dictionary for word segmentation, and the precision of word segmentation result can be improved.
Step 2: part-of-speech tagging
Part-of-speech tagging is carried out to each word in word sequence, as the feature to target word.Because emphasis of the present invention closes Infuse special control term, number and the English letter in control order, it is therefore desirable to can be directly by these three by part-of-speech tagging The word of type identifies.Wherein digital part-of-speech tagging is " m ", and the part-of-speech tagging of English alphabet sequence is " nx " or " eng ", It is easy to distinguish.The characteristics of being considered as its group of word when carrying out the setting of part-of-speech tagging to special control term, uses part of speech " Sp " Indicate special control term.The group word rule for considering special control term, is arranged as follows:
" Sp0 ": word does not form phrase with its front and back, such as: [Dong Fangtatai].
" Sp1 ": phrase is formed with the word of front, such as: No. 27 [aircraft gate].
" Sp2 ": phrase is formed with subsequent word, such as: 350 degree of [surface wind].
" Sp3 ": phrase can be formed with the word of above or below, such as: [runway] 18L or 18L [runway].
Step 3: including the processing method of special control term in control order
Step A-1: when occurring special control term in control order, centered on the control term, evoke frame, frame Frame consists of two parts: special control term+search content.
Step A-2: corresponding search rule is designed in conjunction with the usage of special control term, is occurred in control order some normal The chunked mode of control term is as shown in table 1, gives the corresponding search rule of different part-of-speech taggings simultaneously in table 1 Then.According to further investigation revealed that, part-of-speech tagging is " Sp1 ", " Sp2 ", " Sp3 " the phrase that is formed of special control term Component is relatively flexible, some elements can be omitted, but its boundary word is substantially number, English alphabet sequence, quantifier.By In these three types of words, the independent frequency of occurrences is low in sentence, therefore higher with the information relevance of special control term in sentence, It can be simplified when design rule library to improve the efficiency of entries match in special control term and rule base.
Table 1
Step A-3: the frame provided in step A-1 includes two parts, according to empty slot attribute respectively by special control Term and phrase other content are filled.
Step 4: not including the processing method of special control term in control order
It will appear a large amount of number or English alphabet sequence, some numbers/English alphabet sequence and spy in control order The different control order degree of association is very high, can carry out classification lookup by occurring special control term simultaneously.In another case, Number or English alphabet sequence appear alone in control order, and surrounding can not be used to indicate the special control of attribute Term, such as: time, frequency, height, speed etc..Relevant solution will be provided in this step.
Step A-1: judge whether occurring digital or English alphabet sequence in control order, if there is not number Or English alphabet, terminate process;Number or English alphabet if it exists, then continue next step.
Step A-2: the number or English alphabet sequence that occur in control order are analyzed, it can be found that its structure can To be divided into three kinds of situations:
1, number/English alphabetic combination front and back carries the word or word that can carry out determined property, such as: rising to 10000 [English Ruler].
2, number/English alphabetic combination has internal special construction, can be judged in conjunction with control land sky call rule, Such as: 123.6 (according to civil aviation control frequency meter, the internal structure of frequency includes decimal point).
3, number/English alphabet sequence outside does not obviously indicate the word of its attribute, inside also can not distinguish category The special construction of property, such as: the expression of time, the description of height.
The work of this step is that corresponding solution is proposed for three cases above, i.e., to the number individually occurred/ English alphabetic combination carry out attribute judgement, mainly include two methods: rule-based method and be based on Hidden Markov mould The method of type (Hidden Markov Model, HMM).
(1) rule-based method
For the 1st, 2 kind of situation, the judgement that relevant rule carries out attribute is designed, as shown in table 2.
Table 2
Feature Attribute
Inside includes decimal point Civil aviation control frequency
Letter+number combination Flight number
It is followed by unit: rice Highly
It is followed by unit: foot Highly
It is followed by unit: meter Miao Speed
It is followed by unit: kilometer Distance
It is followed by unit: degree It turns to
It is followed by unit: point Time point
It is followed by unit: minute Time span
(2) based on the method for HMM
For the 3rd kind of situation, when both external word had not referred to clearly the number occurred in control order/English alphabet sequence Its attribute out, also without when clearly internal features point out its attribute, need to combine in this control order the word that occurs into Rower note is judged, Forward-backward algorithm solves number/English under different attribute at this time by the information of context The probability that letter occurs.
Hidden Markov Model is distributed B by initial probability distribution π, state transition probability distribution A and observation probability and determines, The formal definition of its model is as follows:
Q is all possible state set, the correspondence part of speech of the word other than target number and English alphabet sequence, target Number is corresponding with English alphabet sequence, is its attribute;V is all possible observation set, the i.e. corresponding output word of part of speech Language;I is status switch;M is observation sequence, in which:
Q={ q1,q2,...,qN, V={ v1,v2,...,vM}
I={ i1,i2,...,iK, O={ o1,o2,...,OM}
Wherein, N is possible status number, and M is possible observation number, and K is actual status number and observation number.A is state Transition probability matrix: A=[aij]N×N, B is observation probability matrix: B=[bj(k)]N×M, π is initial state probability vector: π= (πi), in which:
aij=P (it+1=qj|it=qi), i=1,2 ..., N;J=1,2 ..., N
bj(k)=P (ot=vk|it=qj), k=1,2 ..., M;J=1,2 ..., N
πi=P (i1=qi), i=1,2 ..., N,
The problem of being described according to the third situation, in order to determine the number/English alphabet individually occurred in control order The case where sequence, takes n word (0 < n < 3) before and after target number/English alphabet sequence, and formation sequence, then problem is converted into sequence Column mark the problem of, wherein observation sequence it is known that and other than target word word part of speech it is also known that, therefore solve target number/English The attribute question of text mother is converted into the probability calculation problem of HMM, and the state of maximum probability is the attribute of target, and HMM Parameter determination is then converted into problem concerning study.
The method for solving of probability calculation problem is as follows: setting models λ=(π, A, B) and observation sequence O, then target number/ (in position, t) belongs to state (attribute) q to English alphabet sequenceiProbability are as follows:
γt(i)=P (it=qi| O, λ),
It is calculated by preceding to-backward probability:
Wherein, αt(i) for by the preceding forward direction probability obtained to probability calculation:
αt(i)=P (o1,o2,...,ot,it=qi| λ),
βtIt (i) is backward probability (boundary condition are as follows: βT(i)=1):
βt(i)=P (ot+1,ot+2,...,oT|it=qi, λ),
The problem concerning study of HMM is then solved by using control order corpus and Baum-Welch algorithm, can be estimated Corresponding parameter out.
Step A-3: judging whether number/English alphabet attribute finds out, if having found out attribute by above step, Word slot in frame is filled, frame herein uses: attribute+target number/English alphabet sequence method is filled out It fills.Sometimes there is no adjacent words to carry out determined property for number/English alphabet in control order, therefore passes through the above method It can not all judge objective attribute target attribute, at this moment need the supposition for carrying out attribute according to historical data, and frame is carried out by estimation result Filling.
Step 5: phrase is formed
Due to being the form of structuring in frame, which need to only be extracted frame, and according to appointing Information therein is extracted in business.
It include special to what is occurred in control order the present invention is based on philological semantic chunking and Hidden Markov Model The attribute of phrase and the number/English alphabet sequence or combined sequence that individually occur that control term is formed identify, And form corresponding frame description scheme;
The extraction of the special control term phrase based on semantic chunking includes carrying out word to special control term Property setting, phrase information extraction is carried out using the corresponding search principle of rule design according to special control term;
Number based on Hidden Markov Model/English alphabet sequence Attribute Recognition include using it is preceding to- Backward method is identified and is extracted to target number/English alphabet sequence attribute;
Present invention can apply in air traffic control system to the semantic understanding of control order.The present invention can be effectively treated From the extraction work of the important information of control order.
Embodiment
For convenience of figure and description, implementation steps herein are divided according to the broad flow diagram that Fig. 2 is shown, in combination with reality The control order on border is illustrated.The example of control order is provided first:
1, DAL185, Dong Fangtatai, are slided at 12 meters of seconds of fitful wind along taxiway d5p4a5.
2, Beijing Area, CSN6723, quay top, are kept for 8400 meters by 35 points.
3, CCA1234 please rises to 87 immediately.
Step 1: part of speech analysis
The step includes three step process process: Chinese word segmentation, part-of-speech tagging and target word search, wherein target word can be Special control term or number/English alphabet sequence etc..Due to being already provided with the part of speech of special control term, after part-of-speech tagging Result as shown in figure 4, showing only the target word that special control term, number, English alphabet etc. are paid close attention on the way.According to Part of speech analysis as a result, using part of speech as the feature of word, find out special control term therein first, that is, be labeled as " Sp " Word: Dong Fangtatai, fitful wind, taxiway, Beijing Area;And number/English alphabet sequence, that is, it is labeled as " m " or " eng " Word: DAL, 185,12, d5p4a5, CSN, 6723,35,8400, CCA, 1234,87.First special control term is handled, then Number/English alphabet sequence is handled.
Step 2: special control term process method
The step is scanned for according to the corresponding search rule of result of part-of-speech tagging in previous step, in which: east Control tower, Beijing Area part-of-speech tagging be " Sp0 ", therefore do not need to carry out any search, directly extract;The part of speech of fitful wind It is labeled as " Sp2 ", search rule is to search for the right to boundary, and boundary is quantifier: rice second [r], then search to the right is preposition: edge [p], therefore the stopping when searching the rice second, 12 meters of seconds of all words therebetween are search content;The part-of-speech tagging of taxiway is " Sp3 ", search rule are first to search boundary to the left, then search boundary to the right.Its left side is preposition: edge [p], no content, And right side is English phrase: d5p4a5 [eng], then search is verb to the right: being slided [v], search stops, content are as follows: d5p4a5。
Step 3: processing method when number/English alphabet individually occurs: judgment rule
By the processing of step 2, a part of number is mentioned because appearing in the adjacent position of special control term It takes out, for remaining word: DAL, 185, CSN, 6723,35,8400, CCA, 1234,87, wherein DAL-185, CSN- 6723, the relationship formed this three betweens of word its compositions of CCA-1234 meets the expression formula of flight number, therefore can be judged as table Show flight;35,8400 searched for the right according to corresponding search rule after obtain corresponding unit indicate its attribute be respectively as follows: the time and Highly.
Step 4: processing method when number/English alphabet individually occurs: HMM
The number 87 occurred in the 3rd example sentence is very special, has no and is followed by its bright attribute of unit vocabulary (actual attribute is Height layer), therefore the method that its probability can be solved by Hidden Markov Model is judged.Due to not having on the right of the word Other words, take its first two word: rising, to forming word sequence: rising | arrive | 87, which is classified as the observation sequence of HMM Column, corresponding hidden state sequence are as follows: verb [v] | verb [v] |?, symbol "? " indicate that 87 attribute is unknown, it may be possible to as follows Attribute: height, speed, time, distance etc..The parameter lambda for obtaining HMM after being trained by using training data=(π, A, B), and the state of observation sequence O and other words is it is known that can pass through the preceding item that different attribute is calculated to-backward probability Part probability: γ (i)=P (i=qi| O, λ), wherein qiIt is different attribute, wherein the maximum attribute of λ (i) is corresponding attribute.
Step 5: frame filling
For the information of identification, design framework is unified standard, and frame only includes two word slots: special control term+search Rope content or attribute+number/English alphabet.The filling of frame word slot is carried out according to the working result of step 3 and step 4, is obtained Frame fills result:
1, flight number [attribute]+DAL185
East control tower [special control term]+nothing
+ 12 meters of seconds of fitful wind [special control term]
Taxiway [special control term]+d5p4a5
2, Beijing Area [special control term]+nothing
Flight number [attribute]+CSN6723
+ 35 points of time [attribute]
Highly+8400 meters of [attribute]
3, flight number [attribute]+CCA1234
Highly [attribute]+87
Obtained frame filling result can be used for judging whether occur contradicting with system planning information in control order Place, such as: judge that the height that aircraft in control order rises is 8400 meters by the method for the invention, and flying It is that aircraft is allowed to rise to 9000 meters in planning chart.Information in flight diagram is existed in the form of structuring, therefore It needs to find out relevant information from non-structured control order, the form obtained by the method for the invention are as follows: height [belongs to Property]+8400 meters, it is possible thereby to which it is 9000 meters that it is corresponding, which to find out the height attributes in planning chart, according to attribute, and control order goes out Mistake is showed.
The present invention provides a kind of control order information processing methods based on semantic chunking, implement the technical solution Method and approach it is very much, the above is only a preferred embodiment of the present invention, it is noted that for the general of the art For logical technical staff, various improvements and modifications may be made without departing from the principle of the present invention, these improve and Retouching also should be regarded as protection scope of the present invention.The available prior art of each component part being not known in the present embodiment is subject to reality It is existing.

Claims (8)

1. a kind of control order information processing method based on semantic chunking, which comprises the steps of:
Step 1, Chinese word segmentation operation is carried out to control order, obtains word sequence;
Step 2, part-of-speech tagging is carried out to each word in word sequence, the feature as target word;
Step 3, the control order comprising special control term is handled;
Step 4, the other compositions of control order are handled;
Step 5, according to the processing result of step 3 and step 4, control order is analyzed, is completed to air traffic control system The semantic understanding of control order in system, the information that obtained result is used to judge in control order whether with the plan in system Information is consistent.
2. the method according to claim 1, wherein in step 1, using segmentation methods control order is carried out Text participle operation, obtains word sequence, and during participle, edits control glossary, the control glossary is added It adds in segmentation methods, auxiliary carries out word segmentation processing to control order.
3. according to the method described in claim 2, it is characterized in that, in step 2, for the number in control order, part of speech Mark is m;Part-of-speech tagging for English alphabet sequence is nx or eng;Part-of-speech tagging for special control term is Sp.
4. according to the method described in claim 3, it is characterized in that, carrying out part-of-speech tagging to special control term in step 2 When, specifically it is arranged as follows:
Indicate that word does not form phrase to the special control term with its front and back with Sp0;
Indicate that the word of the special control term and the front forms phrase with Sp1;
Indicate that the special control term forms phrase with word behind with Sp2;
Indicate that the special control term can form phrase with the front or subsequent word with Sp3.
5. according to the method described in claim 4, it is characterized in that, step 3 includes:
Step 3-1, when occurring special control term in control order, using the special control term as center word, according to special The corresponding search rule of mark generates language block frame, and language block frame includes two word slots:
First word slot is special control term, and second word slot is search content, alternatively, first word slot is language block attribute, Second word slot is the corresponding content of attribute occurred in control order;
Step 3-2 designs corresponding search rule in conjunction with the usage of special control term:
The special control term that part-of-speech tagging is Sp0 is directly identified without any search, forms phrase;
The special control term for being Sp1 for part-of-speech tagging, scans for the left, is searched accordingly according to the search rule of definition Boundary;
The special control term for being Sp2 for part-of-speech tagging, scans for the right, is searched according to the search rule in rule base Corresponding boundary;
The special control term for being Sp3 for part-of-speech tagging first scans for the left, is searched for according to the search rule in rule base To corresponding boundary;If scanning for without content, then to the right, corresponding boundary is searched according to the search rule in rule base;
The content that special control term and search obtain is carried out the language block frame that step 3-1 is provided by step 3-3 respectively Filling, the form of frame herein are as follows: first word slot is special control term, and second word slot is search content.
6. according to the method described in claim 5, it is characterized in that, step 4 includes:
Step 4-1 judges whether occurring digital or English alphabet sequence in control order, if there is not number or English Text is female, terminates process;Number or English alphabet if it exists, then continue step 4-2;
Step 4-2, analyzes the number or English alphabet sequence that occur in control order, its structure is divided into three kinds of feelings Condition:
The front and back of the first situation, number or English alphabetic combination carries the word or word that can carry out determined property;
Second situation, number or English alphabetic combination have internal special construction, can in conjunction with control land sky call rule Judged;
The outside of the third situation, number or English alphabet sequence does not obviously indicate the word of its attribute, inside do not have yet It can distinguish the special construction of attribute;
Step 4-3, for the first case and second situation, the relevant rule of design carry out the judgement of attribute;
Step 4-4 carries out the judgement of attribute using the method based on Hidden Markov Model for the third situation;
Step 4-5 fills out the word slot in language block frame if having found out attribute by step 4-3 and step 4-4 It fills, the form of language block frame herein are as follows: first word slot is attribute, and second word slot is number or English alphabet.
7. according to the method described in claim 6, it is characterized in that, step 4-3 includes: that the following rule of design carries out sentencing for attribute It is disconnected:
If inside the number or English alphabetic combination including decimal point, attribute is civil aviation control frequency;
If the number or English alphabetic combination are the combinations of letter and number, attribute is flight number;
If it is rice that the number or English alphabetic combination, which are followed by unit, attribute is height;
If it is foot that the number or English alphabetic combination, which are followed by unit, attribute is height;
If it is the rice second that the number or English alphabetic combination, which are followed by unit, attribute is speed;
If it is kilometer that the number or English alphabetic combination, which are followed by unit, attribute is distance;
If the number or English alphabetic combination are followed by unit degree of being, attribute is to turn to;
If it is point that attribute is time point that the number or English alphabetic combination, which are followed by unit,;
If it is minute that the number or English alphabetic combination, which are followed by unit, attribute is time span.
8. the method according to the description of claim 7 is characterized in that step 4-4 includes:
Hidden Markov Model formal definition is as follows:
Q is all possible state set, and the word other than target number and English alphabet sequence corresponds to part of speech, target number and English alphabet sequence is corresponding, is its attribute;V is all possible observation set, the i.e. corresponding output word of part of speech;I is shape State sequence;O is observation sequence, in which:
Q={ q1, q2..., qN, V={ v1, v2..., vM,
I={ i1, i2..., iK, O={ o1, o2..., oM,
Wherein, N is possible status number, qNIndicate the possible state of n-th;M is possible observation number, vMIndicate that m-th may Observation, oMIndicate the value of m-th actual observation;K is actual status number, iKIndicate the actual state value of k-th;A is state Transition probability matrix: A=[aij]N×N, B is observation probability matrix: B=[bj(k)]N×M, π is initial state probability vector: π= (πi), in which:
aij=P (it+1=qj|it=qi), i=1,2 ..., N;J=1,2 ..., N, aijIndicate the state q from current timeiIt is raw At the state q of subsequent timejProbability, be denoted as P (it+1=qj|it=qi);
bj(k)=P (ot=vk|it=qj), k=1,2 ..., M;bj(k) the state q by current time is indicatediGenerate current time Output valve vkProbability, be denoted as P (ot=vk|it=qj);
πi=P (i1=qi), πiThe generating probability for indicating the different conditions of initial time, is denoted as P (i1=qi),;
The problem of being described according to the third situation, takes n word before and after target number or English alphabet sequence, formation sequence, The attribute question of solution target number or English alphabet is converted into the probability calculation problem of Hidden Markov Model, maximum probability State be target attribute, and the parameter of Hidden Markov Model determination be then converted into problem concerning study;
The method for solving of probability calculation problem is as follows: setting models λ=(π, A, B) and observation sequence O, then target number or English Text auxiliary sequence belongs to state i.e. attribute q in position tiProbability γt(i) are as follows:
γt(i)=P (it=qi| O, λ),
Wherein, 0 < t≤T, T indicates the position of the last one element of sequence;
It is obtained by forward, backward probability calculation:
Wherein, αt(i) it is the forward direction probability by preceding i-th of the state obtained to probability calculation:
αt(i)=P (o1, o2..., ot, it=qi| λ),
βt(i) it is the backward probability for i-th of state being calculated by backward probability:
βt(i)=P (ot+1, ot+2..., oT|it=qi, λ),
The backward probability β of boundary condition, that is, final position different conditionsT(i) are as follows: βT(i)=1.
CN201910180560.2A 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk Active CN110069771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910180560.2A CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910180560.2A CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Publications (2)

Publication Number Publication Date
CN110069771A true CN110069771A (en) 2019-07-30
CN110069771B CN110069771B (en) 2021-02-05

Family

ID=67365209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910180560.2A Active CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Country Status (1)

Country Link
CN (1) CN110069771B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627257A (en) * 2020-04-13 2020-09-04 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN113158658A (en) * 2021-04-26 2021-07-23 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113569545A (en) * 2021-09-26 2021-10-29 中国电子科技集团公司第二十八研究所 Control information extraction method based on voice recognition error correction model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436764A (en) * 2011-11-21 2012-05-02 南京莱斯信息技术股份有限公司 Method for mining flight number regulatory factors through historical data
CN102849555A (en) * 2012-09-21 2013-01-02 日立电梯(中国)有限公司 High-accuracy earthquake management and control method and system based on cloud computing
CN106875948A (en) * 2017-02-22 2017-06-20 中国电子科技集团公司第二十八研究所 A kind of collision alert method based on control voice
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436764A (en) * 2011-11-21 2012-05-02 南京莱斯信息技术股份有限公司 Method for mining flight number regulatory factors through historical data
CN102849555A (en) * 2012-09-21 2013-01-02 日立电梯(中国)有限公司 High-accuracy earthquake management and control method and system based on cloud computing
CN106875948A (en) * 2017-02-22 2017-06-20 中国电子科技集团公司第二十八研究所 A kind of collision alert method based on control voice
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王煊等: "用于管制语音理解的语义分析方法", 《指挥信息系统与技术》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627257A (en) * 2020-04-13 2020-09-04 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN111627257B (en) * 2020-04-13 2022-05-03 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN113158658A (en) * 2021-04-26 2021-07-23 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113158658B (en) * 2021-04-26 2023-09-19 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113569545A (en) * 2021-09-26 2021-10-29 中国电子科技集团公司第二十八研究所 Control information extraction method based on voice recognition error correction model

Also Published As

Publication number Publication date
CN110069771B (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN102214166B (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN104679850B (en) Address structure method and device
CN106777275A (en) Entity attribute and property value extracting method based on many granularity semantic chunks
WO2020143163A1 (en) Named entity recognition method and apparatus based on attention mechanism, and computer device
WO2017177809A1 (en) Word segmentation method and system for language text
CN110069771A (en) A kind of control order information processing method based on semantic chunking
CN108073570A (en) A kind of Word sense disambiguation method based on hidden Markov model
CN110377724A (en) A kind of corpus keyword Automatic algorithm based on data mining
CN101655837A (en) Method for detecting and correcting error on text after voice recognition
CN108664474A (en) A kind of resume analytic method based on deep learning
CN105068990B (en) A kind of English long sentence dividing method of more strategies of Machine oriented translation
CN113569545B (en) Control information extraction method based on voice recognition error correction model
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
CN112133290A (en) Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field
CN103559181A (en) Establishment method and system for bilingual semantic relation classification model
CN110765231A (en) Chapter event extraction method based on common-finger fusion
CN112417873B (en) Automatic cartoon generation method and system based on BBWC model and MCMC
CN108287825A (en) A kind of term identification abstracting method and system
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
CN110134950A (en) A kind of text auto-collation that words combines
CN110232121B (en) Semantic network-based control instruction classification method
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
CN107797986A (en) A kind of mixing language material segmenting method based on LSTM CNN
CN111178009B (en) Text multilingual recognition method based on feature word weighting
CN109460547B (en) Structured control instruction extraction method based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant