CN110069771B

CN110069771B - Control instruction information processing method based on semantic chunk

Info

Publication number: CN110069771B
Application number: CN201910180560.2A
Authority: CN
Inventors: 王煊; 徐秋程; 丁辉; 王冠; 严勇杰; 陈平
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2021-02-05
Anticipated expiration: 2039-03-11
Also published as: CN110069771A

Abstract

The invention discloses a control instruction information processing method based on semantic chunks, which aims to: 1. and constructing a computer-readable structured control instruction, and providing a basis for the automatic processing of the control instruction. 2. The information extraction and semantic analysis of the control instruction are facilitated, and the result precision is improved. By identifying and processing the control term phrase, the method can realize the following auxiliary functions: the method comprises the steps of effectively extracting information carried by basic control terms appearing in a control instruction; accurately extracting information such as the model and the state of the target aircraft; and providing data for information aggregation based on the control instruction. Aiming at the condition that the control instruction contains special control terms, numbers and the like, the method extracts the control instruction by using a frame search method and designing a corresponding rule. The invention improves the information extraction capability of the control instruction and improves the precision of the semantic analysis result.

Description

Control instruction information processing method based on semantic chunk

Technical Field

The invention belongs to the technical field of air traffic control automation systems, and particularly relates to a control instruction information processing method based on semantic chunks.

Background

With the vigorous development of the civil aviation industry in the last 30 years, the requirements of air traffic management are continuously expanded, so that the potential safety hazard problem is increasingly prominent. Statistics show that human factors account for over 75% of the past flight safety incidents, and among them, incidents due to controller error account for 25%. The mainstream method for solving the conflict caused by the error of the controller at present is to strengthen the monitoring equipment of the scene, and prevent the conflict from occurring by monitoring equipment such as a radar, a multipoint positioning system sensor and the like by the scene. Meanwhile, more advanced solutions based on artificial intelligence are also proposed, such as recognizing the controlled speech by using a speech recognition technology and converting the controlled speech into a text format, and performing semantic analysis by using a natural language processing technology.

There are some special control terms in the control instruction, which do not conform to the natural grammar rules and also form word groups (language blocks) with adjacent words. In general natural language processing, named entity recognition is used as a method for extracting an entity or a language block, but the named entity recognition technology only recognizes a person name, a place name, a mechanism name and the like, cannot recognize and extract a phrase (a language block) centered on a control term, and cannot recognize an entity from a combination of english letters and numbers appearing in a control command.

The semantic chunk theory is proposed first by Steve Abney and is a method for shallow syntax analysis. The definition of english chunks is: the sentence is composed of a plurality of chunks, each chunk is composed of words related by syntax, and the sentence has the characteristics of no overlapping, no nesting and no disjointing. The method is used for extracting the special control term phrase and has the following feasibility: 1. the field of empty management is a closed domain, the number of special management terms is limited, and a limited number of chunk rules can be designed; 2. the control instruction conforms to the air-ground communication rule, and the use of the control term conforms to a certain rule and can be directly used for making the rule; 3. the roles of the special control term phrases in the general control instruction are relatively independent, usually only express external environment information, are rarely associated with other words in the control instruction, and meet the definition of sentence chunks.

Disclosure of Invention

The purpose of the invention is as follows: the invention analyzes the composition form of important phrase information appearing in the actual control instruction from the aspect of semantic blocking, and designs a corresponding blocking rule by combining the land-air conversation rule to extract phrases. In the invention, special control terms, numbers and English letter sequences are used as entry points, and identification and extraction are carried out according to the composition characteristics of the special control terms, the numbers and the English letter sequences.

The technical scheme is as follows: the invention provides a regulatory instruction information processing method based on semantic chunks, which comprises the following steps:

step 1, performing Chinese word segmentation operation on a control instruction to obtain a word sequence;

step 2, performing part-of-speech tagging on each word in the word sequence to serve as the characteristic of a target word;

step 3, processing the control instruction containing the special control term;

step 4, processing other components of the control instruction;

and 5, analyzing the control command according to the processing results of the step 3 and the step 4, completing semantic understanding of the control command in the air traffic control system, and using the obtained result to judge whether the information in the control command is consistent with the plan information in the system.

In step 1, a word segmentation algorithm (such as a method based on a dictionary and a hidden Markov model) is adopted to perform Chinese word segmentation operation on a control command to obtain a word sequence, a control term dictionary is edited in the word segmentation process, and some common basic control terms (such as nouns like ground wind, dew point, visibility, towers and the like are obtained according to the collection of the control command) are added into the dictionary. And adding the control term dictionary into a word segmentation algorithm to assist in carrying out word segmentation processing on the control instruction.

In step 2, for the number in the control instruction, the part of speech label is m; part-of-speech tagging for English alphabetic sequences is nx or eng; the part-of-speech tag for a particular regulatory term is Sp.

In step 2, when part-of-speech tagging is performed on the special control term, the following settings are specifically performed:

sp0 indicates that the special control term is not in phrase with the preceding and following words;

using Sp1 to represent the special control term and the preceding words to form phrases;

using Sp2 to represent that the special control term forms a phrase with the following words;

sp3 indicates that the special regulatory term can be used in combination with words preceding or following it.

The step 3 comprises the following steps:

step 3-1, when a special control term appears in the control instruction, the control term is taken as a central word, and a language block frame is generated according to a search rule corresponding to a special label (the special label is to artificially label the control term in advance and set the search rule, and the search rule is given in step 3-2), wherein the language block frame comprises two word slots:

the first word slot is a special control term, the second word slot is search content, or the first word slot is a word block attribute (such as time), and the second word slot is content corresponding to the attribute appearing in the control instruction;

step 3-2, designing a corresponding search rule by combining the usage of the special control term (table 1):

for a special control term with part of speech marked as Sp0, directly identifying without any search to form a phrase;

for a special regulated term with part of speech marked as Sp1, searching to the left, and searching to a corresponding boundary according to a defined search rule (see a table 1 search rule);

searching to the right for the special control term with the part of speech labeled as Sp2, and searching to a corresponding boundary according to the search rule defined in Table 1;

TABLE 1

For a special control term with part of speech marked as Sp3, searching to the left, and searching to a corresponding boundary according to a search rule in a rule base; if no content exists, searching to the right, and searching to a corresponding boundary according to a search rule in a rule base;

and 3-3, respectively filling the special control terms and the searched content for the framework given in the step 3-1, such as: the ground wind is 3 meters and seconds, the ground wind is a special control term, the 3 meters and seconds are search contents, and the form of the framework here is as follows: the first word slot is a special regulated term and the second word slot is search content.

Step 4 comprises the following steps:

step 4-1, judging whether a numeric or English letter sequence appears in the control command, and if no numeric or English letter sequence appears, ending the process; if the numbers or English letters exist, continuing the step 4-2;

step 4-2, analyzing the numeric or English letter sequence appearing in the control command, and dividing the structure into three conditions:

in the first case, characters or words which can be subjected to attribute judgment are carried before and after the combination of numbers or English letters;

in the second case, the combination of numbers or English letters has an internal special structure, and can be judged by combining with the rules of land-air control communication;

in the third case, the outside of the numeric or english alphabet sequence has no words indicating its attributes clearly, and the inside thereof has no special structure capable of distinguishing the attributes;

4-3, designing related rules to judge the attributes of the first case and the second case;

4-4, judging the attributes by adopting a hidden Markov model-based method for the third condition;

step 4-5, if the attributes have been solved through step 4-3 and step 4-4, filling the word slots in the frame of the word block in the form of: the first word slot is an attribute, and the second word slot is a number or an English letter.

Step 4-3 comprises: the following rules are designed for attribute judgment:

if the number or English letter combination contains decimal points, the attribute is civil aviation control frequency;

if the number or English letter combination is a combination of letters and numbers, the attribute is the flight number;

if the number or the English letter is combined and then the unit is rice, the attribute is height;

if the number or the English letter is combined and then the unit is a foot, the attribute is height;

if the number or the English letter is combined and then the unit is meter second, the attribute is speed;

if the number or the English letter is combined and then the unit is a kilometer, the attribute is a distance;

if the number or the English letter is combined and then the unit is the degree, the attribute is the turning direction;

if the unit is a point after the number or the English letter is combined, the attribute is a time point;

if the number or the English letter is combined and then the unit is the minute, the attribute is the time length.

Step 4-4 comprises:

the hidden markov model form is defined as follows:

q is the set of all possible states, the part of speech corresponding to the words other than the target number and English letter sequence, and the attribute of the target number and English letter sequence (the term "it" refers to the target number and English letter sequence); v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; o is an observation sequence, wherein:

Q＝{q₁,q₂,...,q_N}，V＝{v₁,v₂,...,v_M}，

I＝{i₁,i₂,...,i_K}，O＝{o₁,o₂,...,o_M}，

where N is the number of possible states, q_NRepresents the nth possible state; m is the number of possible observations, v_MRepresenting the Mth possible observation, o_MRepresents the Mth actual observation; k is the actual number of states, i_KRepresents the Kth actual state value; a is the state transition probability matrix: a ═ a_ij]_N×NAnd B is the observed probability matrix: b ═ B_j(K)]_N×MAnd pi is the initial state probability vector: pi ═ pi (pi)_i) Wherein:

a_ij＝P(i_t+1＝q_j|i_t＝q_i),i＝1,2,...,N；j＝1,2,...,N，a_ijindicating the state q from the current time_iGenerating a state q at the next time_jProbability of (i) is denoted as P (i)_t+1＝q_j|i_t＝q_i)；

b_j(k)＝P(o_t＝v_k|i_t＝q_j),k＝1,2,...,M；b_j(k) Indicating the state q from the current time_iGenerating an output value v at the current time_kProbability of (a) is denoted as P (o)_t＝v_k|i_t＝q_j)；

π_i＝P(i₁＝q_i)，π_iProbability of generation of different states at the initial time, denoted as P (i)₁＝q_i),；

According to the problem described in the third case, in order to determine the situation of a number or an English letter sequence which independently appears in a control command, n words before and after a target number or an English letter sequence are taken to form a sequence, and the problem is converted into a sequence labeling problem, wherein an observation sequence is known, the part of speech of a word other than the target word is also known, the problem of solving the attribute problem of the target number or the English letter is converted into a probability calculation problem of a Hidden Markov Model (HMM), the state with the maximum probability is the attribute of the target, and the parameter determination of the Hidden Markov Model is converted into a learning problem;

the solving method of the probability calculation problem is as follows: given the model λ ═ (pi, a, B) and the observation sequence O, the target digit or english alphabet sequence belongs to the state at position t, i.e. the attribute q_iProbability of (gamma)_t(i) Comprises the following steps:

γ_t(i)＝P(i_t＝q_i|O,λ)，

wherein 0< T ≦ T, T representing the position of the last element of the sequence;

calculating by forward and backward probabilities:

wherein alpha is_t(i) Forward probability of the ith state calculated for the forward probability:

α_t(i)＝P(o₁,o₂,...,o_t,i_t＝q_i|λ)，

β_t(i) the backward probability of the ith state obtained by calculating the backward probability:

β_t(i)＝P(o_t+1,o_t+2,...,O_T|i_t＝q_i,λ)，

boundary condition, i.e. backward probability beta of different states of the final position_T(i) Comprises the following steps: beta is a_T(i)＝1。

The learning problem of the HMM is solved by using the regulation instruction corpus and the Baum-Welch algorithm, and corresponding parameters can be estimated.

Has the advantages that: the invention has the following technical effects:

(1) the automatic recognition control command is a phrase mainly comprising special control terms, numbers and English letters.

(2) Information aggregation on the control instructions can be realized.

(3) The performance of semantic analysis of the control instruction is improved.

Drawings

The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a flow chart of a method for extracting regulatory instruction information based on semantic chunks.

FIG. 2 is a main flowchart of a method for extracting regulatory instruction information based on semantic chunks.

Fig. 3 is a view showing a structure of a hidden markov model.

Fig. 4 shows the result of the completed part-of-speech analysis.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

The implementation process and steps of the invention are as follows, and the flow chart is shown in fig. 1.

Step 1: chinese word segmentation

And performing Chinese word segmentation operation on the control command by using a word segmentation algorithm to obtain a word sequence. In the word segmentation process, basic control terms are added in the word segmentation dictionary, so that the precision of the word segmentation result can be improved.

Step 2: part-of-speech tagging

And performing part-of-speech tagging on each word in the word sequence as the characteristic of the target word. Because the invention focuses on special control terms, numbers and English letters in the control command, the three types of words need to be directly identified through part-of-speech tagging. The part-of-speech label of the number is "m", and the part-of-speech label of the English letter sequence is "nx" or "eng", so that the part-of-speech labels are easy to distinguish. When the part-of-speech tagging is set for the special control term, the characteristics of the group of the special control term are considered, and the part-of-speech "Sp" is used for representing the special control term. Considering the word-group rule of the special control terms, the following settings are carried out:

"Sp 0": do not form phrases with words before and after it, such as: [ east tower ].

"Sp 1": and the preceding words form phrases such as: number 27 [ stand ].

"Sp 2": form phrases with the following words, such as: [ ground wind ]350 degrees.

"Sp 3": phrases may be formed with preceding or following words, such as: [ runway ]18L or 18L [ runway ].

And step 3: method for processing special control terms contained in control instruction

Step A-1: when a special control term appears in a control instruction, the frame is excited by taking the control term as a center, and the frame consists of two parts: special regulated terms + search content.

Step A-2: corresponding search rules are designed by combining the usage of special control terms, the blocking modes of some common control terms appearing in the control instructions are shown in table 1, and the search rules corresponding to different part-of-speech labels are given in table 1 at the same time. According to further research, the phrase formed by special control terms with parts of speech labeled as 'Sp 1', 'Sp 2' and 'Sp 3' is relatively flexible, some elements can be omitted, and boundary words are basically numbers, English letter sequences and quantifications. Because the three words appear in the sentence with low frequency, the information association degree between the sentence and the special control term is higher, and the efficiency of matching the special control term with the entries in the rule base can be improved by simplifying the design of the rule base.

TABLE 1

Step A-3: the framework given in step a-1 comprises two parts, and the special control terms and the phrase other contents are filled according to the empty slot attributes.

And 4, step 4: processing method for not containing special control terms in control instruction

A large number of numeric or English letter sequences can appear in the control command, the association degree of some numeric/English letter sequences and the special control command is very high, and the category search can be carried out by simultaneously appearing special control terms. In another case, the numeric or alphabetic sequence appears solely in the regulatory directive, surrounded by no special regulatory terms that can be used to designate attributes, such as: time, frequency, altitude, speed, etc. A related solution will be given in this step.

Step A-1: judging whether a numeric or English letter sequence appears in the control command, and if the numeric or English letter sequence does not appear, ending the process; if there are numbers or English letters, the next step is continued.

Step A-2: analyzing the numeric or English letter sequence appearing in the control command, the structure can be divided into three cases:

1. characters or words which can be subjected to attribute judgment are carried before and after the combination of the numbers and the English letters, such as: up to 10000 feet.

2. The number/English letter combination has an internal special structure, and can be judged by combining the air-ground control conversation rule, such as: 123.6 (according to the civil aviation control frequency table, the internal structure of the frequency contains decimal points).

3. The outside of the numeric/english letter sequence has no words that clearly indicate its attributes, and the inside has no special structure that can distinguish the attributes, such as: time, altitude description.

The work of this step is to propose the corresponding solution to the above three kinds of situation, namely carry on the judgement of the attribute to the figure/English letter combination appearing alone, mainly include two methods: rule-based methods and Hidden Markov Model (HMM) based methods.

(1) Rule-based method

In cases 1 and 2, the attribute is determined by designing the relevant rule, as shown in table 2.

TABLE 2

Characteristics of	Properties
		Containing decimal point inside	Civil aviation control frequency
Letter + number combination	Flight number
		A rear connection unit: rice and its production process	Height
A rear connection unit: foot	Height
		A rear connection unit: second of meter	Speed of rotation
A rear connection unit: kilometer is	Distance between two adjacent plates
		A rear connection unit: degree of rotation	Steering
A rear connection unit: is divided into	Point in time
		A rear connection unit: minute (min)	Length of time

(2) HMM-based method

For the case 3, when the numeric/english alphabet sequence appearing in the control instruction neither has an external word to explicitly indicate its attribute nor has a clear internal feature to indicate its attribute, the word appearing in the control instruction needs to be labeled, that is, the judgment is performed by depending on context information, and at this time, the forward-backward algorithm solves the probability of the numeric/english alphabet appearing under different attributes.

The hidden Markov model is determined by initial probability distribution pi, state transition probability distribution A and observation probability distribution B, and the form of the model is defined as follows:

q is all possible state sets, the corresponding part of speech of the words except the target number and the English letter sequence, and the attribute of the words corresponding to the target number and the English letter sequence; v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; m is an observation sequence, wherein:

Q＝{q₁,q₂,...,q_N}，V＝{v₁,v₂,...,v_M}

I＝{i₁,i₂,...,i_K}，O＝{o₁,o₂,...,O_M}

where N is the number of possible states, M is the number of possible observations, and K is the actual number of states and observations. A is the state transition probability matrix: a ═ a_ij]_N×NAnd B is the observed probability matrix: b ═ B_j(k)]_N×MAnd pi is the initial state probability vector: pi ═ pi (pi)_i) Wherein:

a_ij＝P(i_t+1＝q_j|i_t＝q_i),i＝1,2,...,N；j＝1,2,...,N

b_j(k)＝P(o_t＝v_k|i_t＝q_j),k＝1,2,...,M；j＝1,2,...,N

π_i＝P(i₁＝q_i),i＝1,2,...,N，

according to the problem described in the third case, in order to determine the case of a numeric/english alphabet sequence appearing independently in a regulation instruction, n words (0< n <3) before and after a target numeric/english alphabet sequence are taken to form a sequence, and the problem is converted into a sequence labeling problem, wherein an observation sequence is known, and the part of speech of words other than the target word is also known, so that the problem of solving the attribute of the target numeric/english alphabet is converted into a probability calculation problem of an HMM, the state with the maximum probability is the attribute of the target, and the parameter determination of the HMM is converted into a learning problem.

The solving method of the probability calculation problem is as follows: given the model λ ═ (pi, a, B) and the observation sequence O, the target digit/english alphabet sequence (at position t) belongs to the state (attribute) q_iThe probability of (c) is:

γ_t(i)＝P(i_t＝q_i|O,λ)，

calculated by forward-backward probability:

wherein alpha is_t(i) Forward probability calculated for forward probability:

α_t(i)＝P(o₁,o₂,...,o_t,i_t＝q_i|λ)，

β_t(i) is the backward probability (boundary condition: beta)_T(i)＝1)：

β_t(i)＝P(o_t+1,o_t+2,...,o_T|i_t＝q_i,λ)，

Step A-3: judging whether the attributes of the numbers/English letters are obtained or not, if the attributes are obtained through the steps, filling word slots in a frame, wherein the frame adopts the following steps: and filling by using an attribute + target number/English letter sequence method. In some cases, the number/english letter in the control instruction does not have an adjacent word to perform attribute judgment, so that the target attribute cannot be judged by the above method, and at this time, the attribute needs to be estimated according to the historical data, and frame filling needs to be performed according to the estimation result.

And 5: phrase formation

Since the frame is structured, the step only needs to extract the frame and extract the information in the frame according to the task.

The method is based on semantic chunks and hidden Markov models of linguistics to identify phrases which are formed by special control terms and appear in control instructions and attributes of singly appearing numeric/English letter sequences or sequence combinations, and form corresponding frame description structures;

the extraction of the special control term phrase based on the semantic chunk comprises the steps of performing part-of-speech setting on the special control term, and designing a corresponding search principle according to the use rule of the special control term to extract phrase information;

the attribute recognition of the number/English letter sequence based on the hidden Markov model comprises the steps of recognizing and extracting the attribute of a target number/English letter sequence by using a forward-backward method;

the method can be applied to semantic understanding of the control command in the air traffic control system. The invention can effectively process the extraction work of the important information from the control instruction.

Examples

For convenience of illustration and description, the steps implemented herein are divided according to the main flow chart shown in fig. 2, and are explained in conjunction with the actual policing instructions. First, an example of a policing instruction is given:

1. DAL185, east tower, gust of wind 12 m sec, glides along taxiway d5p4a 5.

2. Beijing area, CSN6723, above the berth head, 35 minutes, 8400 meters hold.

3. CCA1234, please immediately go up to 87.

Step 1: part of speech analysis

The method comprises three steps of treatment processes: chinese segmentation, part-of-speech tagging and target word searching, wherein the target word can be a special control term or a numeric/English letter sequence and the like. Because the part of speech of the special control term is set, the result after part of speech tagging is shown in fig. 4, and only the target words of the special control term, the number, the English letter and the like which focus on are shown in the way. According to the result of the part-of-speech analysis, the part-of-speech is taken as the feature of a word, and a special control term, namely the word marked as 'Sp', is firstly found out: east tower, gust, taxiways, Beijing area; and the numeric/english alphabetic sequence, i.e. the words labeled "m" or "eng": DAL, 185, 12, d5p4a5, CSN, 6723, 35, 8400, CCA, 1234, 87. The special control terms are processed first, and then the numeric/English letter sequences are processed.

Step 2: method for processing special control terms

The step is to search according to a search rule corresponding to a result of part-of-speech tagging in the previous step, wherein: parts of speech of east tower and Beijing regions are marked as Sp0, so that extraction is directly carried out without any search; the part of speech of the gust is labeled as "Sp 2", the search rule is to search right to the boundary, and the boundary is quantifier: meter seconds [ r ], then search right for prepositions: edge [ p ], thus stopping when meter seconds are searched, and all words 12 meter seconds in between are the search content; the part of speech of the taxiway is marked as Sp3, and the search rule is that the boundary is searched to the left and then to the right. Prepositions are on the left: along [ p ], nothing, and the right side is the English phrase: d5p4a5[ eng ], and then search to the right to be a verb: glide [ v ], search stop, whose contents are: d5p4a 5.

And step 3: the processing method when the number/English letters appear independently comprises the following steps: judgment rule

Through the process of step 2, a part of the numbers are extracted because they appear in the adjacent positions of the special regulated terms, and for the remaining words: DAL, 185, CSN, 6723, 35, 8400, CCA, 1234 and 87, wherein the relationship formed among the components of three pairs of words of DAL-185, CSN-6723 and CCA-1234 satisfies the expression of flight number, and therefore can be judged to represent flight; 35. 8400 searches to the right according to the corresponding search rule to obtain the corresponding units, which indicate that the attributes are respectively: time and altitude.

And 4, step 4: the processing method when the number/English letters appear independently comprises the following steps: HMM

The number 87 appearing in the 3 rd example sentence is very special, and no successor unit word indicates the attribute (the actual attribute is a height layer), so that the judgment can be made by a method of solving the probability by using a hidden Markov model. Because the right side of the word has no other words, the first two words are taken: ascending until a word sequence is formed: go up | to |87, the word sequence is the observation sequence of the HMM, the corresponding hidden state sequence is: verb [ v ]]I verb [ v ]]|? The symbol "? "the attributes of representation 87 are unknown, and may be the following: altitude, speed, time, distance, etc. By training with training data to obtain HMM parameters λ ═ (pi, a, B), and knowing the states of observation sequence O and other words, conditional generalities of different attributes can be obtained by forward-backward probability calculationsRate: γ (i) ═ P (i ═ q_iI O, λ), where q is_iAre different attributes, with the attribute with λ (i) being the largest being the corresponding attribute.

And 5: frame filling

Aiming at the identified information, a frame is designed, and for unified specification, the frame only comprises two word slots: special regulated terms + search content or attributes + numbers/english letters. And (4) filling the frame word slot according to the working results of the step (3) and the step (4) to obtain a frame filling result:

1. flight number [ Attribute ] + DAL185

East tower [ special control terminology ] + none

Gust [ special regulatory terminology ] +12 msec

Taxiway [ special regulatory terminology ] + d5p4a5

2. Beijing area [ special regulatory terms ] + none

Flight number [ attribute ] + CSN6723

Time [ attribute ] +35 points

Height [ attribute ] +8400 m

3. Flight number [ attribute ] + CCA1234

Height [ Attribute ] +87

The obtained frame filling result can be used for judging whether a place contradictory to the system plan information appears in the control instruction, for example: the method of the invention judges that the height of the aircraft in the control command is 8400 meters, and the aircraft is raised to 9000 meters in the flight schedule. The information in the flight schedule is in a structured form, so that the relevant information needs to be found from the unstructured control command, and the form obtained by the method of the invention is as follows: height [ attribute ] +8400 meters, so that 9000 meters corresponding to the height attribute in the schedule can be found according to the attribute, and the regulation instruction has an error.

The present invention provides a method for processing regulatory instruction information based on semantic chunks, and a plurality of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A method for processing regulatory instruction information based on semantic chunks is characterized by comprising the following steps:

step 3, processing the control instruction containing the special control term;

step 4, processing other components of the control instruction;

step 5, analyzing the control command according to the processing results of the step 3 and the step 4, completing semantic understanding of the control command in the air traffic control system, and using the obtained result to judge whether the information in the control command is consistent with the plan information in the system;

in the step 1, performing Chinese word segmentation operation on a control instruction by adopting a word segmentation algorithm to obtain a word sequence, editing a control term dictionary in the word segmentation process, adding the control term dictionary into the word segmentation algorithm, and assisting in performing word segmentation processing on the control instruction;

in step 2, the part-of-speech tag of the number in the control instruction is m; the part-of-speech label of the English letter sequence is nx or eng; part-of-speech tagging of the special regulatory term is Sp;

using Sp3 to indicate that the special control term can form phrases with words before and after the special control term;

the step 3 comprises the following steps:

step 3-1, when a special control term appears in the control instruction, taking the special control term as a central word, and generating a language block frame according to a search rule corresponding to a special label, wherein the language block frame comprises two word slots:

the first word slot is a special control term, the second word slot is search content, or the first word slot is a word block attribute, and the second word slot is content corresponding to an attribute appearing in a control instruction;

step 3-2, designing a corresponding search rule by combining the usage of the special control terms:

searching to the left for a special regulated term with part of speech labeled as Sp1, and searching to a corresponding boundary according to a defined search rule;

searching right for the special control terms with the parts of speech marked as Sp2, and searching corresponding boundaries according to the search rules in the rule base;

and 3-3, filling the special control terms and the searched content in the language block framework given in the step 3-1, wherein the framework is in the form of: the first word slot is a special control term, and the second word slot is search content;

step 4 comprises the following steps:

in the first case, characters or words capable of attribute judgment are carried before and after the combination of numbers or English letters;

in the third case, there is no character indicating its attribute outside the numeric or english alphabet sequence, and there is no special structure inside which the attribute can be distinguished;

step 4-5, if the attributes have been found through step 4-3 and step 4-4, filling the word slots in the frame of the word block, where the form of the frame of the word block is: the first word slot is an attribute, and the second word slot is a number or an English letter;

step 4-3 comprises: the following rules are designed for attribute judgment:

if the number or the English letter is combined and then the unit is the minute, the attribute is the time length;

step 4-4 comprises:

the hidden markov model form is defined as follows:

q is all possible state sets, the part of speech corresponding to the words except the target number and the English letter sequence, and the attribute corresponding to the target number and the English letter sequence; v is all possible observation sets, namely output words corresponding to the parts of speech; i is a sequence of states; o is an observation sequence, wherein:

Q＝{q₁，q₂，...，q_N}，V＝{v₁，v₂，...，v_M}，

I＝{i₁，i₂，...，i_K}，O＝{o₁，o₂，...，o_M}，

where N is the number of possible states, q_NRepresents the nth possible state; m is the number of possible observations, v_MRepresents the Mth possible observation, o_MA value representing the Mth actual observation; k is the actual number of states, i_KRepresents the Kth actual state value; a is the state transition probability matrix: a ═ a_ij]_N×NAnd B is the observed probability matrix: b ═ B_j(k)]_N×MAnd pi is the initial state probability vector: pi ═ pi (pi)_i) Wherein:

a_ij＝P(i_t+1＝q_j|i_t＝q_i)，i＝1，2，...，N；j＝1，2，...，N，a_ijindicating the state q from the current time_iGenerating a state q at the next time_jProbability of (i) is denoted as P (i)_t+1＝q_j|i_t＝q_i)；

b_j(k)＝P(o_t＝v_k|i_t＝q_j)，k＝1，2，...，M；b_j(k) Indicating the state q from the current time_jGenerating an output value v at the current time_kProbability of (a) is denoted as P (o)_t＝v_k|i_t＝q_j)；

π_i＝P(i₁＝q_i)，π_iProbability of generation of different states at the initial time, denoted as P (i)₁＝q_i)；

According to the problem described in the third situation, n words before and after the target number or English letter sequence are taken to form a sequence, the probability calculation problem that the attribute problem of the target number or English letter is converted into a hidden Markov model is solved, the state with the maximum probability is the attribute of the target, and the parameter determination of the hidden Markov model is converted into a learning problem;

γ_t(i)＝P(i_t＝q_i|O，λ)，

wherein T is more than 0 and less than or equal to T, and T represents the position of the last element of the sequence;

calculating by forward and backward probabilities:

α_t(i)＝P(o₁，o₂，...，o_t，i_t＝q_i|λ)，

β_t(i)＝P(o_t+1，o_t+2，...，o_T|i_t＝q_i，λ)，