CN110069771B - Control instruction information processing method based on semantic chunk - Google Patents

Control instruction information processing method based on semantic chunk Download PDF

Info

Publication number
CN110069771B
CN110069771B CN201910180560.2A CN201910180560A CN110069771B CN 110069771 B CN110069771 B CN 110069771B CN 201910180560 A CN201910180560 A CN 201910180560A CN 110069771 B CN110069771 B CN 110069771B
Authority
CN
China
Prior art keywords
attribute
word
sequence
english letter
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910180560.2A
Other languages
Chinese (zh)
Other versions
CN110069771A (en
Inventor
王煊
徐秋程
丁辉
王冠
严勇杰
陈平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201910180560.2A priority Critical patent/CN110069771B/en
Publication of CN110069771A publication Critical patent/CN110069771A/en
Application granted granted Critical
Publication of CN110069771B publication Critical patent/CN110069771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a control instruction information processing method based on semantic chunks, which aims to: 1. and constructing a computer-readable structured control instruction, and providing a basis for the automatic processing of the control instruction. 2. The information extraction and semantic analysis of the control instruction are facilitated, and the result precision is improved. By identifying and processing the control term phrase, the method can realize the following auxiliary functions: the method comprises the steps of effectively extracting information carried by basic control terms appearing in a control instruction; accurately extracting information such as the model and the state of the target aircraft; and providing data for information aggregation based on the control instruction. Aiming at the condition that the control instruction contains special control terms, numbers and the like, the method extracts the control instruction by using a frame search method and designing a corresponding rule. The invention improves the information extraction capability of the control instruction and improves the precision of the semantic analysis result.

Description

Control instruction information processing method based on semantic chunk
Technical Field
The invention belongs to the technical field of air traffic control automation systems, and particularly relates to a control instruction information processing method based on semantic chunks.
Background
With the vigorous development of the civil aviation industry in the last 30 years, the requirements of air traffic management are continuously expanded, so that the potential safety hazard problem is increasingly prominent. Statistics show that human factors account for over 75% of the past flight safety incidents, and among them, incidents due to controller error account for 25%. The mainstream method for solving the conflict caused by the error of the controller at present is to strengthen the monitoring equipment of the scene, and prevent the conflict from occurring by monitoring equipment such as a radar, a multipoint positioning system sensor and the like by the scene. Meanwhile, more advanced solutions based on artificial intelligence are also proposed, such as recognizing the controlled speech by using a speech recognition technology and converting the controlled speech into a text format, and performing semantic analysis by using a natural language processing technology.
There are some special control terms in the control instruction, which do not conform to the natural grammar rules and also form word groups (language blocks) with adjacent words. In general natural language processing, named entity recognition is used as a method for extracting an entity or a language block, but the named entity recognition technology only recognizes a person name, a place name, a mechanism name and the like, cannot recognize and extract a phrase (a language block) centered on a control term, and cannot recognize an entity from a combination of english letters and numbers appearing in a control command.
The semantic chunk theory is proposed first by Steve Abney and is a method for shallow syntax analysis. The definition of english chunks is: the sentence is composed of a plurality of chunks, each chunk is composed of words related by syntax, and the sentence has the characteristics of no overlapping, no nesting and no disjointing. The method is used for extracting the special control term phrase and has the following feasibility: 1. the field of empty management is a closed domain, the number of special management terms is limited, and a limited number of chunk rules can be designed; 2. the control instruction conforms to the air-ground communication rule, and the use of the control term conforms to a certain rule and can be directly used for making the rule; 3. the roles of the special control term phrases in the general control instruction are relatively independent, usually only express external environment information, are rarely associated with other words in the control instruction, and meet the definition of sentence chunks.
Disclosure of Invention
The purpose of the invention is as follows: the invention analyzes the composition form of important phrase information appearing in the actual control instruction from the aspect of semantic blocking, and designs a corresponding blocking rule by combining the land-air conversation rule to extract phrases. In the invention, special control terms, numbers and English letter sequences are used as entry points, and identification and extraction are carried out according to the composition characteristics of the special control terms, the numbers and the English letter sequences.
The technical scheme is as follows: the invention provides a regulatory instruction information processing method based on semantic chunks, which comprises the following steps:
step 1, performing Chinese word segmentation operation on a control instruction to obtain a word sequence;
step 2, performing part-of-speech tagging on each word in the word sequence to serve as the characteristic of a target word;
step 3, processing the control instruction containing the special control term;
step 4, processing other components of the control instruction;
and 5, analyzing the control command according to the processing results of the step 3 and the step 4, completing semantic understanding of the control command in the air traffic control system, and using the obtained result to judge whether the information in the control command is consistent with the plan information in the system.
In step 1, a word segmentation algorithm (such as a method based on a dictionary and a hidden Markov model) is adopted to perform Chinese word segmentation operation on a control command to obtain a word sequence, a control term dictionary is edited in the word segmentation process, and some common basic control terms (such as nouns like ground wind, dew point, visibility, towers and the like are obtained according to the collection of the control command) are added into the dictionary. And adding the control term dictionary into a word segmentation algorithm to assist in carrying out word segmentation processing on the control instruction.
In step 2, for the number in the control instruction, the part of speech label is m; part-of-speech tagging for English alphabetic sequences is nx or eng; the part-of-speech tag for a particular regulatory term is Sp.
In step 2, when part-of-speech tagging is performed on the special control term, the following settings are specifically performed:
sp0 indicates that the special control term is not in phrase with the preceding and following words;
using Sp1 to represent the special control term and the preceding words to form phrases;
using Sp2 to represent that the special control term forms a phrase with the following words;
sp3 indicates that the special regulatory term can be used in combination with words preceding or following it.
The step 3 comprises the following steps:
step 3-1, when a special control term appears in the control instruction, the control term is taken as a central word, and a language block frame is generated according to a search rule corresponding to a special label (the special label is to artificially label the control term in advance and set the search rule, and the search rule is given in step 3-2), wherein the language block frame comprises two word slots:
the first word slot is a special control term, the second word slot is search content, or the first word slot is a word block attribute (such as time), and the second word slot is content corresponding to the attribute appearing in the control instruction;
step 3-2, designing a corresponding search rule by combining the usage of the special control term (table 1):
for a special control term with part of speech marked as Sp0, directly identifying without any search to form a phrase;
for a special regulated term with part of speech marked as Sp1, searching to the left, and searching to a corresponding boundary according to a defined search rule (see a table 1 search rule);
searching to the right for the special control term with the part of speech labeled as Sp2, and searching to a corresponding boundary according to the search rule defined in Table 1;
TABLE 1
Figure BDA0001991066030000031
For a special control term with part of speech marked as Sp3, searching to the left, and searching to a corresponding boundary according to a search rule in a rule base; if no content exists, searching to the right, and searching to a corresponding boundary according to a search rule in a rule base;
and 3-3, respectively filling the special control terms and the searched content for the framework given in the step 3-1, such as: the ground wind is 3 meters and seconds, the ground wind is a special control term, the 3 meters and seconds are search contents, and the form of the framework here is as follows: the first word slot is a special regulated term and the second word slot is search content.
Step 4 comprises the following steps:
step 4-1, judging whether a numeric or English letter sequence appears in the control command, and if no numeric or English letter sequence appears, ending the process; if the numbers or English letters exist, continuing the step 4-2;
step 4-2, analyzing the numeric or English letter sequence appearing in the control command, and dividing the structure into three conditions:
in the first case, characters or words which can be subjected to attribute judgment are carried before and after the combination of numbers or English letters;
in the second case, the combination of numbers or English letters has an internal special structure, and can be judged by combining with the rules of land-air control communication;
in the third case, the outside of the numeric or english alphabet sequence has no words indicating its attributes clearly, and the inside thereof has no special structure capable of distinguishing the attributes;
4-3, designing related rules to judge the attributes of the first case and the second case;
4-4, judging the attributes by adopting a hidden Markov model-based method for the third condition;
step 4-5, if the attributes have been solved through step 4-3 and step 4-4, filling the word slots in the frame of the word block in the form of: the first word slot is an attribute, and the second word slot is a number or an English letter.
Step 4-3 comprises: the following rules are designed for attribute judgment:
if the number or English letter combination contains decimal points, the attribute is civil aviation control frequency;
if the number or English letter combination is a combination of letters and numbers, the attribute is the flight number;
if the number or the English letter is combined and then the unit is rice, the attribute is height;
if the number or the English letter is combined and then the unit is a foot, the attribute is height;
if the number or the English letter is combined and then the unit is meter second, the attribute is speed;
if the number or the English letter is combined and then the unit is a kilometer, the attribute is a distance;
if the number or the English letter is combined and then the unit is the degree, the attribute is the turning direction;
if the unit is a point after the number or the English letter is combined, the attribute is a time point;
if the number or the English letter is combined and then the unit is the minute, the attribute is the time length.
Step 4-4 comprises:
the hidden markov model form is defined as follows:
q is the set of all possible states, the part of speech corresponding to the words other than the target number and English letter sequence, and the attribute of the target number and English letter sequence (the term "it" refers to the target number and English letter sequence); v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; o is an observation sequence, wherein:
Q={q1,q2,...,qN},V={v1,v2,...,vM},
I={i1,i2,...,iK},O={o1,o2,...,oM},
where N is the number of possible states, qNRepresents the nth possible state; m is the number of possible observations, vMRepresenting the Mth possible observation, oMRepresents the Mth actual observation; k is the actual number of states, iKRepresents the Kth actual state value; a is the state transition probability matrix: a ═ aij]N×NAnd B is the observed probability matrix: b ═ Bj(K)]N×MAnd pi is the initial state probability vector: pi ═ pi (pi)i) Wherein:
aij=P(it+1=qj|it=qi),i=1,2,...,N;j=1,2,...,N,aijindicating the state q from the current timeiGenerating a state q at the next timejProbability of (i) is denoted as P (i)t+1=qj|it=qi);
bj(k)=P(ot=vk|it=qj),k=1,2,...,M;bj(k) Indicating the state q from the current timeiGenerating an output value v at the current timekProbability of (a) is denoted as P (o)t=vk|it=qj);
πi=P(i1=qi),πiProbability of generation of different states at the initial time, denoted as P (i)1=qi),;
According to the problem described in the third case, in order to determine the situation of a number or an English letter sequence which independently appears in a control command, n words before and after a target number or an English letter sequence are taken to form a sequence, and the problem is converted into a sequence labeling problem, wherein an observation sequence is known, the part of speech of a word other than the target word is also known, the problem of solving the attribute problem of the target number or the English letter is converted into a probability calculation problem of a Hidden Markov Model (HMM), the state with the maximum probability is the attribute of the target, and the parameter determination of the Hidden Markov Model is converted into a learning problem;
the solving method of the probability calculation problem is as follows: given the model λ ═ (pi, a, B) and the observation sequence O, the target digit or english alphabet sequence belongs to the state at position t, i.e. the attribute qiProbability of (gamma)t(i) Comprises the following steps:
γt(i)=P(it=qi|O,λ),
wherein 0< T ≦ T, T representing the position of the last element of the sequence;
calculating by forward and backward probabilities:
Figure BDA0001991066030000051
wherein alpha ist(i) Forward probability of the ith state calculated for the forward probability:
αt(i)=P(o1,o2,...,ot,it=qi|λ),
βt(i) the backward probability of the ith state obtained by calculating the backward probability:
βt(i)=P(ot+1,ot+2,...,OT|it=qi,λ),
boundary condition, i.e. backward probability beta of different states of the final positionT(i) Comprises the following steps: beta is aT(i)=1。
The learning problem of the HMM is solved by using the regulation instruction corpus and the Baum-Welch algorithm, and corresponding parameters can be estimated.
Has the advantages that: the invention has the following technical effects:
(1) the automatic recognition control command is a phrase mainly comprising special control terms, numbers and English letters.
(2) Information aggregation on the control instructions can be realized.
(3) The performance of semantic analysis of the control instruction is improved.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of a method for extracting regulatory instruction information based on semantic chunks.
FIG. 2 is a main flowchart of a method for extracting regulatory instruction information based on semantic chunks.
Fig. 3 is a view showing a structure of a hidden markov model.
Fig. 4 shows the result of the completed part-of-speech analysis.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
The implementation process and steps of the invention are as follows, and the flow chart is shown in fig. 1.
Step 1: chinese word segmentation
And performing Chinese word segmentation operation on the control command by using a word segmentation algorithm to obtain a word sequence. In the word segmentation process, basic control terms are added in the word segmentation dictionary, so that the precision of the word segmentation result can be improved.
Step 2: part-of-speech tagging
And performing part-of-speech tagging on each word in the word sequence as the characteristic of the target word. Because the invention focuses on special control terms, numbers and English letters in the control command, the three types of words need to be directly identified through part-of-speech tagging. The part-of-speech label of the number is "m", and the part-of-speech label of the English letter sequence is "nx" or "eng", so that the part-of-speech labels are easy to distinguish. When the part-of-speech tagging is set for the special control term, the characteristics of the group of the special control term are considered, and the part-of-speech "Sp" is used for representing the special control term. Considering the word-group rule of the special control terms, the following settings are carried out:
"Sp 0": do not form phrases with words before and after it, such as: [ east tower ].
"Sp 1": and the preceding words form phrases such as: number 27 [ stand ].
"Sp 2": form phrases with the following words, such as: [ ground wind ]350 degrees.
"Sp 3": phrases may be formed with preceding or following words, such as: [ runway ]18L or 18L [ runway ].
And step 3: method for processing special control terms contained in control instruction
Step A-1: when a special control term appears in a control instruction, the frame is excited by taking the control term as a center, and the frame consists of two parts: special regulated terms + search content.
Step A-2: corresponding search rules are designed by combining the usage of special control terms, the blocking modes of some common control terms appearing in the control instructions are shown in table 1, and the search rules corresponding to different part-of-speech labels are given in table 1 at the same time. According to further research, the phrase formed by special control terms with parts of speech labeled as 'Sp 1', 'Sp 2' and 'Sp 3' is relatively flexible, some elements can be omitted, and boundary words are basically numbers, English letter sequences and quantifications. Because the three words appear in the sentence with low frequency, the information association degree between the sentence and the special control term is higher, and the efficiency of matching the special control term with the entries in the rule base can be improved by simplifying the design of the rule base.
TABLE 1
Figure BDA0001991066030000071
Figure BDA0001991066030000081
Step A-3: the framework given in step a-1 comprises two parts, and the special control terms and the phrase other contents are filled according to the empty slot attributes.
And 4, step 4: processing method for not containing special control terms in control instruction
A large number of numeric or English letter sequences can appear in the control command, the association degree of some numeric/English letter sequences and the special control command is very high, and the category search can be carried out by simultaneously appearing special control terms. In another case, the numeric or alphabetic sequence appears solely in the regulatory directive, surrounded by no special regulatory terms that can be used to designate attributes, such as: time, frequency, altitude, speed, etc. A related solution will be given in this step.
Step A-1: judging whether a numeric or English letter sequence appears in the control command, and if the numeric or English letter sequence does not appear, ending the process; if there are numbers or English letters, the next step is continued.
Step A-2: analyzing the numeric or English letter sequence appearing in the control command, the structure can be divided into three cases:
1. characters or words which can be subjected to attribute judgment are carried before and after the combination of the numbers and the English letters, such as: up to 10000 feet.
2. The number/English letter combination has an internal special structure, and can be judged by combining the air-ground control conversation rule, such as: 123.6 (according to the civil aviation control frequency table, the internal structure of the frequency contains decimal points).
3. The outside of the numeric/english letter sequence has no words that clearly indicate its attributes, and the inside has no special structure that can distinguish the attributes, such as: time, altitude description.
The work of this step is to propose the corresponding solution to the above three kinds of situation, namely carry on the judgement of the attribute to the figure/English letter combination appearing alone, mainly include two methods: rule-based methods and Hidden Markov Model (HMM) based methods.
(1) Rule-based method
In cases 1 and 2, the attribute is determined by designing the relevant rule, as shown in table 2.
TABLE 2
Characteristics of Properties
Containing decimal point inside Civil aviation control frequency
Letter + number combination Flight number
A rear connection unit: rice and its production process Height
A rear connection unit: foot Height
A rear connection unit: second of meter Speed of rotation
A rear connection unit: kilometer is Distance between two adjacent plates
A rear connection unit: degree of rotation Steering
A rear connection unit: is divided into Point in time
A rear connection unit: minute (min) Length of time
(2) HMM-based method
For the case 3, when the numeric/english alphabet sequence appearing in the control instruction neither has an external word to explicitly indicate its attribute nor has a clear internal feature to indicate its attribute, the word appearing in the control instruction needs to be labeled, that is, the judgment is performed by depending on context information, and at this time, the forward-backward algorithm solves the probability of the numeric/english alphabet appearing under different attributes.
The hidden Markov model is determined by initial probability distribution pi, state transition probability distribution A and observation probability distribution B, and the form of the model is defined as follows:
q is all possible state sets, the corresponding part of speech of the words except the target number and the English letter sequence, and the attribute of the words corresponding to the target number and the English letter sequence; v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; m is an observation sequence, wherein:
Q={q1,q2,...,qN},V={v1,v2,...,vM}
I={i1,i2,...,iK},O={o1,o2,...,OM}
where N is the number of possible states, M is the number of possible observations, and K is the actual number of states and observations. A is the state transition probability matrix: a ═ aij]N×NAnd B is the observed probability matrix: b ═ Bj(k)]N×MAnd pi is the initial state probability vector: pi ═ pi (pi)i) Wherein:
aij=P(it+1=qj|it=qi),i=1,2,...,N;j=1,2,...,N
bj(k)=P(ot=vk|it=qj),k=1,2,...,M;j=1,2,...,N
πi=P(i1=qi),i=1,2,...,N,
according to the problem described in the third case, in order to determine the case of a numeric/english alphabet sequence appearing independently in a regulation instruction, n words (0< n <3) before and after a target numeric/english alphabet sequence are taken to form a sequence, and the problem is converted into a sequence labeling problem, wherein an observation sequence is known, and the part of speech of words other than the target word is also known, so that the problem of solving the attribute of the target numeric/english alphabet is converted into a probability calculation problem of an HMM, the state with the maximum probability is the attribute of the target, and the parameter determination of the HMM is converted into a learning problem.
The solving method of the probability calculation problem is as follows: given the model λ ═ (pi, a, B) and the observation sequence O, the target digit/english alphabet sequence (at position t) belongs to the state (attribute) qiThe probability of (c) is:
γt(i)=P(it=qi|O,λ),
calculated by forward-backward probability:
Figure BDA0001991066030000101
wherein alpha ist(i) Forward probability calculated for forward probability:
αt(i)=P(o1,o2,...,ot,it=qi|λ),
βt(i) is the backward probability (boundary condition: beta)T(i)=1):
βt(i)=P(ot+1,ot+2,...,oT|it=qi,λ),
The learning problem of the HMM is solved by using the regulation instruction corpus and the Baum-Welch algorithm, and corresponding parameters can be estimated.
Step A-3: judging whether the attributes of the numbers/English letters are obtained or not, if the attributes are obtained through the steps, filling word slots in a frame, wherein the frame adopts the following steps: and filling by using an attribute + target number/English letter sequence method. In some cases, the number/english letter in the control instruction does not have an adjacent word to perform attribute judgment, so that the target attribute cannot be judged by the above method, and at this time, the attribute needs to be estimated according to the historical data, and frame filling needs to be performed according to the estimation result.
And 5: phrase formation
Since the frame is structured, the step only needs to extract the frame and extract the information in the frame according to the task.
The method is based on semantic chunks and hidden Markov models of linguistics to identify phrases which are formed by special control terms and appear in control instructions and attributes of singly appearing numeric/English letter sequences or sequence combinations, and form corresponding frame description structures;
the extraction of the special control term phrase based on the semantic chunk comprises the steps of performing part-of-speech setting on the special control term, and designing a corresponding search principle according to the use rule of the special control term to extract phrase information;
the attribute recognition of the number/English letter sequence based on the hidden Markov model comprises the steps of recognizing and extracting the attribute of a target number/English letter sequence by using a forward-backward method;
the method can be applied to semantic understanding of the control command in the air traffic control system. The invention can effectively process the extraction work of the important information from the control instruction.
Examples
For convenience of illustration and description, the steps implemented herein are divided according to the main flow chart shown in fig. 2, and are explained in conjunction with the actual policing instructions. First, an example of a policing instruction is given:
1. DAL185, east tower, gust of wind 12 m sec, glides along taxiway d5p4a 5.
2. Beijing area, CSN6723, above the berth head, 35 minutes, 8400 meters hold.
3. CCA1234, please immediately go up to 87.
Step 1: part of speech analysis
The method comprises three steps of treatment processes: chinese segmentation, part-of-speech tagging and target word searching, wherein the target word can be a special control term or a numeric/English letter sequence and the like. Because the part of speech of the special control term is set, the result after part of speech tagging is shown in fig. 4, and only the target words of the special control term, the number, the English letter and the like which focus on are shown in the way. According to the result of the part-of-speech analysis, the part-of-speech is taken as the feature of a word, and a special control term, namely the word marked as 'Sp', is firstly found out: east tower, gust, taxiways, Beijing area; and the numeric/english alphabetic sequence, i.e. the words labeled "m" or "eng": DAL, 185, 12, d5p4a5, CSN, 6723, 35, 8400, CCA, 1234, 87. The special control terms are processed first, and then the numeric/English letter sequences are processed.
Step 2: method for processing special control terms
The step is to search according to a search rule corresponding to a result of part-of-speech tagging in the previous step, wherein: parts of speech of east tower and Beijing regions are marked as Sp0, so that extraction is directly carried out without any search; the part of speech of the gust is labeled as "Sp 2", the search rule is to search right to the boundary, and the boundary is quantifier: meter seconds [ r ], then search right for prepositions: edge [ p ], thus stopping when meter seconds are searched, and all words 12 meter seconds in between are the search content; the part of speech of the taxiway is marked as Sp3, and the search rule is that the boundary is searched to the left and then to the right. Prepositions are on the left: along [ p ], nothing, and the right side is the English phrase: d5p4a5[ eng ], and then search to the right to be a verb: glide [ v ], search stop, whose contents are: d5p4a 5.
And step 3: the processing method when the number/English letters appear independently comprises the following steps: judgment rule
Through the process of step 2, a part of the numbers are extracted because they appear in the adjacent positions of the special regulated terms, and for the remaining words: DAL, 185, CSN, 6723, 35, 8400, CCA, 1234 and 87, wherein the relationship formed among the components of three pairs of words of DAL-185, CSN-6723 and CCA-1234 satisfies the expression of flight number, and therefore can be judged to represent flight; 35. 8400 searches to the right according to the corresponding search rule to obtain the corresponding units, which indicate that the attributes are respectively: time and altitude.
And 4, step 4: the processing method when the number/English letters appear independently comprises the following steps: HMM
The number 87 appearing in the 3 rd example sentence is very special, and no successor unit word indicates the attribute (the actual attribute is a height layer), so that the judgment can be made by a method of solving the probability by using a hidden Markov model. Because the right side of the word has no other words, the first two words are taken: ascending until a word sequence is formed: go up | to |87, the word sequence is the observation sequence of the HMM, the corresponding hidden state sequence is: verb [ v ]]I verb [ v ]]|? The symbol "? "the attributes of representation 87 are unknown, and may be the following: altitude, speed, time, distance, etc. By training with training data to obtain HMM parameters λ ═ (pi, a, B), and knowing the states of observation sequence O and other words, conditional generalities of different attributes can be obtained by forward-backward probability calculationsRate: γ (i) ═ P (i ═ qiI O, λ), where q isiAre different attributes, with the attribute with λ (i) being the largest being the corresponding attribute.
And 5: frame filling
Aiming at the identified information, a frame is designed, and for unified specification, the frame only comprises two word slots: special regulated terms + search content or attributes + numbers/english letters. And (4) filling the frame word slot according to the working results of the step (3) and the step (4) to obtain a frame filling result:
1. flight number [ Attribute ] + DAL185
East tower [ special control terminology ] + none
Gust [ special regulatory terminology ] +12 msec
Taxiway [ special regulatory terminology ] + d5p4a5
2. Beijing area [ special regulatory terms ] + none
Flight number [ attribute ] + CSN6723
Time [ attribute ] +35 points
Height [ attribute ] +8400 m
3. Flight number [ attribute ] + CCA1234
Height [ Attribute ] +87
The obtained frame filling result can be used for judging whether a place contradictory to the system plan information appears in the control instruction, for example: the method of the invention judges that the height of the aircraft in the control command is 8400 meters, and the aircraft is raised to 9000 meters in the flight schedule. The information in the flight schedule is in a structured form, so that the relevant information needs to be found from the unstructured control command, and the form obtained by the method of the invention is as follows: height [ attribute ] +8400 meters, so that 9000 meters corresponding to the height attribute in the schedule can be found according to the attribute, and the regulation instruction has an error.
The present invention provides a method for processing regulatory instruction information based on semantic chunks, and a plurality of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (1)

1. A method for processing regulatory instruction information based on semantic chunks is characterized by comprising the following steps:
step 1, performing Chinese word segmentation operation on a control instruction to obtain a word sequence;
step 2, performing part-of-speech tagging on each word in the word sequence to serve as the characteristic of a target word;
step 3, processing the control instruction containing the special control term;
step 4, processing other components of the control instruction;
step 5, analyzing the control command according to the processing results of the step 3 and the step 4, completing semantic understanding of the control command in the air traffic control system, and using the obtained result to judge whether the information in the control command is consistent with the plan information in the system;
in the step 1, performing Chinese word segmentation operation on a control instruction by adopting a word segmentation algorithm to obtain a word sequence, editing a control term dictionary in the word segmentation process, adding the control term dictionary into the word segmentation algorithm, and assisting in performing word segmentation processing on the control instruction;
in step 2, the part-of-speech tag of the number in the control instruction is m; the part-of-speech label of the English letter sequence is nx or eng; part-of-speech tagging of the special regulatory term is Sp;
in step 2, when part-of-speech tagging is performed on the special control term, the following settings are specifically performed:
sp0 indicates that the special control term is not in phrase with the preceding and following words;
using Sp1 to represent the special control term and the preceding words to form phrases;
using Sp2 to represent that the special control term forms a phrase with the following words;
using Sp3 to indicate that the special control term can form phrases with words before and after the special control term;
the step 3 comprises the following steps:
step 3-1, when a special control term appears in the control instruction, taking the special control term as a central word, and generating a language block frame according to a search rule corresponding to a special label, wherein the language block frame comprises two word slots:
the first word slot is a special control term, the second word slot is search content, or the first word slot is a word block attribute, and the second word slot is content corresponding to an attribute appearing in a control instruction;
step 3-2, designing a corresponding search rule by combining the usage of the special control terms:
for a special control term with part of speech marked as Sp0, directly identifying without any search to form a phrase;
searching to the left for a special regulated term with part of speech labeled as Sp1, and searching to a corresponding boundary according to a defined search rule;
searching right for the special control terms with the parts of speech marked as Sp2, and searching corresponding boundaries according to the search rules in the rule base;
for a special control term with part of speech marked as Sp3, searching to the left, and searching to a corresponding boundary according to a search rule in a rule base; if no content exists, searching to the right, and searching to a corresponding boundary according to a search rule in a rule base;
and 3-3, filling the special control terms and the searched content in the language block framework given in the step 3-1, wherein the framework is in the form of: the first word slot is a special control term, and the second word slot is search content;
step 4 comprises the following steps:
step 4-1, judging whether a numeric or English letter sequence appears in the control command, and if no numeric or English letter sequence appears, ending the process; if the numbers or English letters exist, continuing the step 4-2;
step 4-2, analyzing the numeric or English letter sequence appearing in the control command, and dividing the structure into three conditions:
in the first case, characters or words capable of attribute judgment are carried before and after the combination of numbers or English letters;
in the second case, the combination of numbers or English letters has an internal special structure, and can be judged by combining with the rules of land-air control communication;
in the third case, there is no character indicating its attribute outside the numeric or english alphabet sequence, and there is no special structure inside which the attribute can be distinguished;
4-3, designing related rules to judge the attributes of the first case and the second case;
4-4, judging the attributes by adopting a hidden Markov model-based method for the third condition;
step 4-5, if the attributes have been found through step 4-3 and step 4-4, filling the word slots in the frame of the word block, where the form of the frame of the word block is: the first word slot is an attribute, and the second word slot is a number or an English letter;
step 4-3 comprises: the following rules are designed for attribute judgment:
if the number or English letter combination contains decimal points, the attribute is civil aviation control frequency;
if the number or English letter combination is a combination of letters and numbers, the attribute is the flight number;
if the number or the English letter is combined and then the unit is rice, the attribute is height;
if the number or the English letter is combined and then the unit is a foot, the attribute is height;
if the number or the English letter is combined and then the unit is meter second, the attribute is speed;
if the number or the English letter is combined and then the unit is a kilometer, the attribute is a distance;
if the number or the English letter is combined and then the unit is the degree, the attribute is the turning direction;
if the unit is a point after the number or the English letter is combined, the attribute is a time point;
if the number or the English letter is combined and then the unit is the minute, the attribute is the time length;
step 4-4 comprises:
the hidden markov model form is defined as follows:
q is all possible state sets, the part of speech corresponding to the words except the target number and the English letter sequence, and the attribute corresponding to the target number and the English letter sequence; v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; o is an observation sequence, wherein:
Q={q1,q2,...,qN},V={v1,v2,...,vM},
I={i1,i2,...,iK},O={o1,o2,...,oM},
where N is the number of possible states, qNRepresents the nth possible state; m is the number of possible observations, vMRepresents the Mth possible observation, oMA value representing the Mth actual observation; k is the actual number of states, iKRepresents the Kth actual state value; a is the state transition probability matrix: a ═ aij]N×NAnd B is the observed probability matrix: b ═ Bj(k)]N×MAnd pi is the initial state probability vector: pi ═ pi (pi)i) Wherein:
aij=P(it+1=qj|it=qi),i=1,2,...,N;j=1,2,...,N,aijindicating the state q from the current timeiGenerating a state q at the next timejProbability of (i) is denoted as P (i)t+1=qj|it=qi);
bj(k)=P(ot=vk|it=qj),k=1,2,...,M;bj(k) Indicating the state q from the current timejGenerating an output value v at the current timekProbability of (a) is denoted as P (o)t=vk|it=qj);
πi=P(i1=qi),πiProbability of generation of different states at the initial time, denoted as P (i)1=qi);
According to the problem described in the third situation, n words before and after the target number or English letter sequence are taken to form a sequence, the probability calculation problem that the attribute problem of the target number or English letter is converted into a hidden Markov model is solved, the state with the maximum probability is the attribute of the target, and the parameter determination of the hidden Markov model is converted into a learning problem;
the solving method of the probability calculation problem is as follows: given the model λ ═ (pi, a, B) and the observation sequence O, the target digit or english alphabet sequence belongs to the state at position t, i.e. the attribute qiProbability of (gamma)t(i) Comprises the following steps:
γt(i)=P(it=qi|O,λ),
wherein T is more than 0 and less than or equal to T, and T represents the position of the last element of the sequence;
calculating by forward and backward probabilities:
Figure FDA0002847986200000041
wherein alpha ist(i) Forward probability of the ith state calculated for the forward probability:
αt(i)=P(o1,o2,...,ot,it=qi|λ),
βt(i) the backward probability of the ith state obtained by calculating the backward probability:
βt(i)=P(ot+1,ot+2,...,oT|it=qi,λ),
boundary condition, i.e. backward probability beta of different states of the final positionT(i) Comprises the following steps: beta is aT(i)=1。
CN201910180560.2A 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk Active CN110069771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910180560.2A CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910180560.2A CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Publications (2)

Publication Number Publication Date
CN110069771A CN110069771A (en) 2019-07-30
CN110069771B true CN110069771B (en) 2021-02-05

Family

ID=67365209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910180560.2A Active CN110069771B (en) 2019-03-11 2019-03-11 Control instruction information processing method based on semantic chunk

Country Status (1)

Country Link
CN (1) CN110069771B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627257B (en) * 2020-04-13 2022-05-03 南京航空航天大学 Control instruction safety rehearsal and verification method based on aircraft motion trend prejudgment
CN113158658B (en) * 2021-04-26 2023-09-19 中国电子科技集团公司第二十八研究所 Knowledge embedding-based structured control instruction extraction method
CN113569545B (en) * 2021-09-26 2021-12-07 中国电子科技集团公司第二十八研究所 Control information extraction method based on voice recognition error correction model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875948A (en) * 2017-02-22 2017-06-20 中国电子科技集团公司第二十八研究所 A kind of collision alert method based on control voice
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436764A (en) * 2011-11-21 2012-05-02 南京莱斯信息技术股份有限公司 Method for mining flight number regulatory factors through historical data
CN102849555B (en) * 2012-09-21 2015-07-15 日立电梯(中国)有限公司 High-accuracy earthquake management and control method and system based on cloud computing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875948A (en) * 2017-02-22 2017-06-20 中国电子科技集团公司第二十八研究所 A kind of collision alert method based on control voice
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王煊等.用于管制语音理解的语义分析方法.《指挥信息系统与技术》.2019,第10卷(第1期),第32-36页. *
用于管制语音理解的语义分析方法;王煊等;《指挥信息系统与技术》;20190228;第10卷(第1期);第32-36页 *

Also Published As

Publication number Publication date
CN110069771A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
Lin et al. A real-time ATC safety monitoring framework using a deep learning approach
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN110069771B (en) Control instruction information processing method based on semantic chunk
CN105095204B (en) The acquisition methods and device of synonym
CN107291684B (en) Word segmentation method and system for language text
CN107797987B (en) Bi-LSTM-CNN-based mixed corpus named entity identification method
CN110555084A (en) remote supervision relation classification method based on PCNN and multi-layer attention
CN108021552A (en) A kind of power system operation ticket method for extracting content and system
CN113569545B (en) Control information extraction method based on voice recognition error correction model
CN110826334A (en) Chinese named entity recognition model based on reinforcement learning and training method thereof
Uchimoto et al. The unknown word problem: a morphological analysis of Japanese using maximum entropy aided by a dictionary
CN105138514A (en) Dictionary-based method for maximum matching of Chinese word segmentations through successive one word adding in forward direction
CN110428830B (en) Regular expression-based empty pipe instruction intention identification method
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN113177412A (en) Named entity identification method and system based on bert, electronic equipment and storage medium
Oualil et al. Real-time integration of dynamic context information for improving automatic speech recognition
CN114153971A (en) Error-containing Chinese text error correction, identification and classification equipment
CN110134950A (en) A kind of text auto-collation that words combines
CN108536781B (en) Social network emotion focus mining method and system
CN110232121B (en) Semantic network-based control instruction classification method
CN107797986B (en) LSTM-CNN-based mixed corpus word segmentation method
CN113326702A (en) Semantic recognition method and device, electronic equipment and storage medium
Kleinert et al. Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning
CN109460547B (en) Structured control instruction extraction method based on natural language processing
CN113158658B (en) Knowledge embedding-based structured control instruction extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant