Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and the device for extracting and classifying the aircraft telemetry data feature provided by the embodiment of the invention are described below with reference to the accompanying drawings, and firstly, the method for extracting and classifying the aircraft telemetry data feature provided by the embodiment of the invention is described with reference to the accompanying drawings.
FIG. 2 is a flow chart of a method for aircraft telemetry data feature extraction and hierarchical classification in accordance with an embodiment of the present invention.
As shown in FIG. 2, the aircraft telemetry data feature extraction and hierarchical classification method comprises the following steps:
in step S201, a time series analysis is performed on the regularized pre-processing data based on a preset model, and a plurality of basic feature classes are constructed.
It is understood that the embodiment of the present invention may perform time series analysis on the regularized pre-processed data based on the ARMA (p, q) model to construct the basic feature class.
Further, in an embodiment of the present invention, before constructing the plurality of basic feature classes, the method further includes: constructing a consistent telemetry data time sequence according to the unified clock; and carrying out anomaly detection and fault treatment according to the telemetering flight outline, carrying out regularization treatment on the original data based on a zero-mean unit variance method, and carrying out outlier elimination on the regularized data by utilizing the Latt criterion to obtain regularized preprocessed data.
It is understood that the data preprocessing includes: constructing a consistent telemetering data time sequence according to a unified clock, performing exception detection and fault handling according to a telemetering flight outline, performing regularization processing on original data based on a zero-mean unit variance method, and performing wild value elimination on the regularized data by using a Lett criterion.
In one embodiment of the present invention, the preset model may be an ARMA (p, q) model, and the plurality of basic feature classes include a steady class, a linear class, a periodic class, and a random class.
It is understood that the time series analysis includes: and (3) carrying out time sequence analysis on the preprocessed data based on an ARMA (p, q) model, and constructing four basic characteristic classes covering main behavior characteristics of the telemetering data of the aircraft, namely a steady class, a linear class, a periodic class and a random class.
In step S202, a high-order statistic of the sample is constructed by using a sliding window for each basic feature class, and a feature dimension with the highest weight in each class of data is extracted as a class feature through a RELIEF-F algorithm.
It can be understood that the embodiment of the invention can utilize a sliding window to construct high-order statistics of the sample, and extract the feature dimension with the largest weight as the class feature through the RELIEF-F algorithm.
Further, in an embodiment of the present invention, extracting, as the category feature, a feature dimension with the highest weight in each type of data through a RELIEF-F algorithm includes: and constructing high-order statistics of a steady class, a linear class, a periodic class and a random class by using a sliding window, and establishing a multi-class multi-feature data classification mathematical model by combining original data.
It is understood that the feature configurations include: and constructing high-order statistics of 4 basic feature classes by using a sliding window, and establishing a multi-class multi-feature data classification mathematical model by combining original data. The feature extraction comprises the following steps: and extracting the characteristic dimension with the maximum weight, namely the second-order statistic variance through a RELIEF-F algorithm to serve as the time series sample characteristic of dynamic clustering and hierarchical classification.
In step S203, a first-level class is identified by a K-means clustering method, and a class identifier is added to the sample sequence.
Further, in an embodiment of the present invention, identifying the first class by a K-means clustering method includes: and identifying a steady-state slow change class and a dynamic speed change class by a K-means clustering method, and marking a first-stage class identifier on the sample sequence.
It is understood that dynamic clustering includes: two first-level categories, namely a steady-state slow variation category (steady and linear) and a dynamic quick variation category (random and periodic), are identified by a K-means clustering method (namely Kmeans clustering, an unsupervised learning method), and a sample sequence is marked with a first-level category identifier.
In step S204, a linear support vector machine is used to perform a first-stage classification on the data, a linear regression classifier is used to perform a second-stage classification on the steady-state slowly-varying data, and an autocorrelation characteristic analysis classifier is used to perform a second-stage classification on the dynamic rapidly-varying data, so as to obtain a classification result.
It is understood that hierarchical classification includes: the method comprises the steps of adopting a linear Support Vector Machine (SVM) to realize primary classification of data, obtaining two primary classification results of a steady-state slowly-varying class and a dynamic slowly-varying class, then adopting a linear regression classifier to finish secondary classification of the steady-state slowly-varying data, and adopting an autocorrelation characteristic analysis classifier to finish secondary classification of the dynamic slowly-varying data.
The aircraft telemetry data feature extraction and hierarchical classification method will be described in detail by the specific embodiment, as shown in fig. 3, and the following description will be made for the operation of the analysis clustering stage and the classification recognition stage, respectively.
1. Analytical clustering stage
The main workflow of the analytical clustering stage is as follows:
(1) a telemetering sampling link: and each equipment unit of the aircraft measures the current working condition in real time to form a telemetering original data sequence and pushes the telemetering original data sequence to a data preprocessing link according to the system clock of the aircraft.
(2) A data preprocessing link: the method comprises the steps of performing down sampling or interpolation to ensure the consistency of time sequence length, completing the numerical conversion from binary source codes to physical quantities according to a telemetering processing method, performing comparison judgment on current telemetering data according to an abnormality detection standard specified by a telemetering data flight outline (switching to a fault handling process if abnormality is found), performing regularization processing on original data based on a zero-mean unit variance method, and performing wild value elimination on the regularized data by using a Lett criterion.
(3) A time series analysis link: switching a steering interface of a switch a into a time sequence analysis link, carrying out time sequence analysis on actually measured data based on an ARMA (p, q) model to obtain a basic feature class of the telemetering data, and solving the ARMA model parameters under the condition that the mean square error RMSE value of various telemetering parameter prediction data and original data is minimum.
(4) A characteristic construction link: and turning a steering interface of the switch a to a feature construction link, constructing a sample window data sequence based on a sliding window with a fixed size, calculating high-order statistics (1 order to 4 orders) of each basic feature class according to the sample window data sequence, and constructing a multi-class multi-dimensional feature time sequence by combining original data to be used as an input sample of a feature extraction unit.
(5) And (3) a feature extraction link: and extracting the feature dimension with the maximum weight in various types of data, namely a variance sequence (2-order statistic) through a RELIEF-F algorithm, and taking the feature dimension as a time sequence sample for dynamic clustering and hierarchical classification.
(6) Clustering: and (3) switching the switch b steering interface to a clustering link, and identifying two primary categories, namely a steady-state slow variation category (steady data and linear data) and a dynamic speed variation category (random data and periodic data) by a K-means clustering method.
2. Classification and identification phase
The main workflow of the classification identification stage is as follows:
(1) a telemetering sampling link: and each equipment unit of the aircraft measures the current working condition in real time to form a telemetering original data sequence and pushes the telemetering original data sequence to a data preprocessing link according to the system clock of the aircraft.
(2) A data preprocessing link: the method comprises the steps of performing down sampling or interpolation to ensure the consistency of time sequence length, completing the numerical conversion from binary source codes to physical quantities according to a telemetering processing method, performing comparison judgment on current telemetering data according to an abnormality detection standard specified by a telemetering data flight outline (switching to a fault handling process if abnormality is found), performing regularization processing on original data based on a zero-mean unit variance method, and performing wild value elimination on the regularized data by using a Lett criterion.
(3) A characteristic construction link: and turning a steering interface of the switch a to a feature construction link, constructing a sample window data sequence based on a sliding window with a fixed size, calculating high-order statistics (1 order to 4 orders) of each basic feature class according to the sample window data sequence, and constructing a multi-class multi-dimensional feature time sequence by combining original data to be used as an input sample of a feature extraction unit.
(4) And (3) a feature extraction link: and extracting the feature dimension with the maximum weight in various types of data, namely a variance sequence (2-order statistic) through a RELIEF-F algorithm, and taking the feature dimension as a time sequence sample for dynamic clustering and hierarchical classification.
(5) And (3) grading and classifying links: and switching the switch b steering interface IV into a classification link, wherein the specific process comprises the following steps:
step 5.1: and (3) realizing primary classification of data by adopting a linear Support Vector Machine (SVM), obtaining two primary classification results of a steady-state slow change class and a dynamic speed change class, and if the data are in the steady-state class, turning to the step 5.2, otherwise, turning to the step 5.3.
Step 5.2: and (3) finishing secondary classification of the steady-state slowly-varying data by adopting a linear regression classifier, namely performing linear regression on the characteristic data to solve to obtain the slope and intercept of a fitting curve, and setting a comparison and judgment threshold value to perform secondary classification on the steady-state slowly-varying data obtained by the primary classification so as to realize classification of the steady data and the linear data.
Step 5.3: and (2) finishing secondary classification of the dynamic variable data by adopting an autocorrelation characteristic analysis classifier, namely utilizing the unique autocorrelation peak characteristic of a random sequence, solving and obtaining the ratio of the 'peak-average difference' and 'peak-valley difference' of an autocorrelation curve by carrying out autocorrelation analysis on the characteristic data, and setting a comparison threshold value according to the ratio to carry out secondary classification on the dynamic variable data obtained by primary classification so as to realize the classification of periodic data and random data.
Further, the following will explain in detail the main methods applied in the embodiments of the present invention, specifically as follows:
1. regularized pre-processing
The measured data was subjected to regularization pre-processing (normalization) as shown in fig. 4. As can be seen from the figure, the regularization results of the 4 basic feature class raw data are mixed together, and classification and identification cannot be carried out.
2. Time series analysis
The method is based on an ARMA (p, q) model (ARMA represents an autoregressive moving average model, a parameter p of the ARMA represents a p-order autoregressive process, a parameter q represents a q-order moving average process), time sequence analysis is carried out, and 4 basic feature classes, namely a steady-state data class, a linear data class, a periodic data class and a random data class, are obtained, wherein the types cover the main category of the telemetering data behavior characteristics in the satellite measurement and control field.
As can be seen from the analysis of the 4 types of feature data, the mean square error RMSE values of the predicted data and the original data are not greatly different under different ARMA model parameters (p: 1 to 5, q: 0 to 5), and the training result under the condition of the minimum random noise RMSE is shown in fig. 5, where the steady-state class { p ═ 1, q ═ 1}, the linear class { p ═ 1, q ═ 0}, the periodic class { p ═ 4, q ═ 2}, the random class { p ═ 5, q ═ 5} (RMSE does not converge, and decreases as p, q increases). When random noise exists, stable convergence solution does not exist in ARMA models p and q of various types of data under the RMSE minimum condition, so that the situation that efficient and reliable classification cannot be carried out through ARMA model p and q parameter training optimization is deduced.
3. Feature extraction
The time series of individual telemetry parameters can be considered as random variables with only 1-dimensional numerical features, with mean, variance, and higher order statistics (typically 4 th order). High-order statistics of 4 basic feature classes are constructed based on a fixed-size sliding window (see fig. 6), and a multi-class multi-dimensional feature time series data classification model is established by combining original data.
4. Feature selection
(1) Feature selection algorithm
And extracting a variance sequence (2-order statistic) which is the characteristic dimension with the maximum weight through a RELIEF-F algorithm to be used as a time sequence sample for dynamic clustering and hierarchical classification. The RELIEF-F feature selection algorithm is described as follows:
input training set X ═ X i1, …, d, randomly selecting a sample number n
S1, setting the d-dimension weight vector w as [ w [ ]1,…,wd]=0
S2:for i=1:n
S2a randomly selecting a sample X from the input X
S2b, calculating the nearest similar sample h and the nearest dissimilar sample m of X in X
S2c:for j=1:d
wj=wj–diff(j,x,h)/n+diff(j,x,m)/n
S3 Return weight vector w
S4 the first k features with the largest output weight (k 1)
Wherein, diff (j, x)1,x2) Representing two samples x1,x2The difference in absolute value in the j-th dimension.
(2) Example of an embodiment of an algorithm
Taking the classification scenario adopted by the one-level SVM as an example, the sample X1 represents an ideal linear data time series, and the sample X2 represents an ideal periodic data time series, both of which contain random noise. The simulation result of the regularization of the high-order statistics is shown in fig. 7, and the change of the feature weight is shown in fig. 8.
5. Dynamic clustering
Two primary categories, namely a steady-state slow-varying category (steady and linear) and a dynamic-varying category (random and periodic), are identified by a Kmeans clustering method (namely a K-means clustering algorithm, K objects are randomly selected as initial clustering centers, then the distance between each object and various sub-clustering centers is calculated, and each object is allocated to the clustering center closest to the object), and a sample sequence is marked with a primary category identifier.
The clustering results are shown in fig. 9, where blue represents the steady-state slow-change class and red represents the dynamic-change class. The mean vector for each category is: steady state slow changing class [0.02463], dynamic speed changing class [0.33086 ].
6. Classification and classification
(1) First order classification
As can be known from the basic principle of supervised learning neighbor method, samples far away from the classification boundary do not contribute to the final classification decision, so that linear data (number 2) and random data (number 3) can be respectively used as support vectors of two primary classes, and a primary SVM classifier can be obtained by adopting a linear support vector machine structure, as shown in FIG. 10.
Wherein:
a. weight coefficient:
w=[-8.908e-05,9.421e+00],
b=-1.455;
b. classifying the hyperplane:
slope: 9.456e-06 of the total weight of the product,
intercept: 0.154.
c. support vector:
(2) secondary classification 1: class of steady state slow changes
And constructing a linear regression classifier for the secondary steady-state slow change data, obtaining the slope w and the intercept b of the steady-state slow change data, and setting a classification threshold value according to the slope w and the intercept b to perform classification judgment.
a. Dynamic threshold ratio determination
In the embodiment of the invention, the relation between the slope and the intercept is used as the threshold parameter of classification, and the calculation process is as follows:
order to
Setting the data variation amplitude less than 10% of the mean value as stable and constant data, including:
wherein C is
1Representing steady data, C
2Representing linear data;
the derivation can obtain:
thus, the dynamic threshold function can be designed as:
b. linear loop
Based on preprocessed data
Obtaining the slope and intercept of the linear regression model by using a least square method; and judging whether the data belongs to stable data or linear data according to a threshold value.
c. Linear data regression analysis classification simulation
The simulation results are shown in fig. 11, in which:
the dynamic threshold is: 7.97452111 e-05;
the slope is: 0.00015176, respectively;
the intercept is: 0.4545477.
(3) and (3) secondary classification 2: dynamic speed change class
And constructing an autocorrelation characteristic classifier for the two-stage dynamic speed change data, setting a classification threshold value by utilizing the unique autocorrelation peak characteristic of the random sequence to perform classification judgment, wherein the classification threshold value is a random class when the classification threshold value is larger than the threshold value, and the classification threshold value is a periodic class when the classification threshold value is not larger than the threshold value.
a. Threshold calculation
The autocorrelation analysis threshold calculation formula of the data of the category 3 and the category 4 is as follows:
the threshold calculation of a certain iteration of the two types of data is as follows, as can be seen from fig. 12, random data has a relatively obvious autocorrelation peak, the autocorrelation sequence of the periodic sequence exhibits a periodic variation characteristic, the value of the threshold data can be compared, random data with a larger threshold value and periodic data with a smaller threshold value are set as a reasonable comparison threshold, and then the two types of data can be classified.
b. Threshold value ratio judgment threshold for iterative solution of two types of data
The threshold values of the two types of data are respectively calculated in each iteration, the minimum value of the random data threshold value is successively recorded as the upper boundary of the comparison threshold value, the maximum value of the periodic data threshold value is recorded as the lower boundary of the comparison threshold value, and then the mathematical expectation is solved according to the upper boundary and the lower boundary of the comparison threshold value obtained by the iteration calculation to obtain the judgment threshold, as shown in fig. 13.
Wherein: the lower bound converges to: 0.4960, respectively;
the upper bound converges to: 0.7880, respectively;
the decision threshold converges on: 0.6420.
c. simulation test
The results of the distance simulation test of the obtained various data and the classification plane of the classifier by verifying the classification effect of the classifier are shown in fig. 14.
Wherein: random data decision distance: the minimum value is 0.134, the maximum value is 0.285, and the average value is 0.232;
periodic data decision distance: minimum value of 0.146, maximum value of 0.146, average value of 0.146;
comparison accuracy: 100.00 percent
The aircraft telemetry data feature extraction and hierarchical classification method is further explained by specific examples, and specifically comprises the following steps:
1. data source
The smart communication test satellite of Qinghua university has the task of actually measuring key telemetering data of a satellite platform in 24 hours from 7 months to 20 days to 21 days in 2018.
2. Arc segment segmentation
Because the arc section of the ground measurement and control equipment is limited, satellite data obtained in the 24-hour tracking process is divided into a plurality of segments, and effective extraction is needed before data processing. Through analyzing the measured data, it is found that each type of telemetry parameter does not completely obey the behavior of a certain characteristic base class, but shows the characteristics of different base classes in different time periods, and meanwhile, wild value skip points happen, as shown in fig. 1 (b). As can be seen from fig. 1(b) (to the nearest three digits), the battery pack temperature [ + Y ] shows a steady-state class characteristic in a local area, and there is a jump in value between successive steady-state class data, while the parameter shows a linear class characteristic in the overall trend, and there are also few jump points.
3. Classification and classification platform
An example of a telemetry data classification platform is shown in fig. 15, which mainly includes: the device comprises a file reading module, an arc section selection module, a range regularization module, a zero mean unit variance module, a wild value eliminating module, a characteristic construction module and the like.
4. Characteristic structure
To verify the effectiveness of using the second-order statistic variance as a classification feature, a mixed test is performed on the battery pack temperature [ + Y ] data (set as X1) in the current arc segment and the standard base class period data (set as X2), and the feature selection result is shown in fig. 16 and 17.
From the above test results, although the measured data is not an ideal linear data base class, the results obtained by using the RELIEF-algorithm show that the variance data sequence still has good optimal weight characteristics.
5. Dynamic clustering
And (3) carrying out cluster analysis on the temperature [ + Y ] of the storage battery pack by using a constructed K-means cluster device (marking the data of the type as a 'steady state slow change type' before classification), wherein the cluster success rate is as follows: 100 percent.
6. Classification and classification
First-level classification success rate: 90.9797 percent
The success rate of secondary classification is as follows: 97.2222 percent
It should be noted that the machine learning in the embodiment of the present invention is embodied in the fusion use of unsupervised learning and supervised learning, that is, the dynamic clustering of the steady-state slowly-varying class data and the dynamic slowly-varying class data is realized through the unsupervised learning, and an input condition is provided for performing data classification by using a reasonable supervised learning method in a targeted manner in the next step, that is, the steady-state slowly-varying class data is classified by using a linear regression method, and the dynamic slowly-varying class data is classified by using an autocorrelation characteristic analysis method, so that the classification efficiency of the telemetry data is effectively improved.
In summary, the invention realizes the feature extraction and effective classification of the aircraft telemetry data, realizes the dynamic clustering of the steady state slowly-varying class data and the dynamic speed-varying class data through unsupervised learning, and provides input conditions for data classification by pertinently adopting a reasonable supervised learning method, namely the steady state slowly-varying class data is classified by adopting a linear regression method, the dynamic speed-varying class data is classified by adopting an autocorrelation characteristic analysis method, the classification efficiency of the telemetry data is effectively improved, the classification result covers the main category of the telemetry data behavior characteristics in the aircraft measurement and control field, the classifier parameters can be used as the input conditions of a compressed transmission bandwidth or other signal processing methods, and the invention can provide technical support for the future deep space detection task and the development of a spatial information network in our country.
According to the method for extracting and classifying the characteristics of the telemetering data of the aircraft provided by the embodiment of the invention, the basic category division criterion is firstly obtained through time sequence analysis, the characteristic vector of the telemetering data is constructed through high-order statistic analysis and characteristic extraction algorithm, the clustering analysis is carried out on the characteristic vector of the telemetering data by adopting the unsupervised learning method, and the classification is carried out on the characteristic vector of the telemetering data by adopting the supervised learning classification algorithm, so that the characteristic extraction and the effective classification of the telemetering data of the aircraft can be realized, the classification result covers the main category of the behavior characteristics of the telemetering data in the field of aircraft measurement and control, the classifier parameters can be used as the input conditions of compressed transmission bandwidth or other signal processing methods, the technical support can be provided for the future deep space detection task and the development of a spatial information network in China, and, The method has the characteristics of high instantaneity, high classification success rate and the like, and can provide important technical parameters for telemetering data prediction and compression.
The aircraft telemetry data feature extraction and hierarchical classification device provided by the embodiment of the invention is described next with reference to the attached drawings.
FIG. 18 is a schematic structural diagram of an aircraft telemetry data feature extraction and hierarchical classification apparatus according to an embodiment of the invention.
As shown in fig. 18, the aircraft telemetry data feature extraction and classification device 10 includes: an analysis module 100, a feature processing module 200, a clustering module 300, and a classification module 400.
The analysis module 100 is configured to perform time series analysis on the regularized pre-processing data based on a preset model, and construct a plurality of basic feature classes. The feature processing module 200 is configured to construct high-order statistics of the sample by using a sliding window for each basic feature class, and extract a feature dimension with the highest weight in each class of data as a class feature through a RELIEF-F algorithm. The clustering module 300 is configured to identify a first class by a K-means clustering method, and mark a class on the sample sequence. The classification module 400 is configured to perform a first-stage classification on the data by using a linear support vector machine, perform a second-stage classification on the steady-state slowly-varying data by using a linear regression classifier, and perform a second-stage classification on the dynamic slowly-varying data by using an autocorrelation characteristic analysis classifier, so as to obtain a classification result. The device 10 of the embodiment of the invention can realize the feature extraction and effective classification of the telemetering data of the aircraft, the classification result covers the main category of the behavior features of the telemetering data in the field of aircraft measurement and control, the classifier parameters can be used as input conditions of compressed transmission bandwidth or other signal processing methods, and the technical support can be provided for the future deep space exploration task and the development of a spatial information network in China.
Further, in one embodiment of the present invention, the apparatus 10 of the embodiment of the present invention further comprises: and a data preprocessing module.
Before constructing a plurality of basic feature classes, the data preprocessing module constructs a consistent telemetering data time sequence according to a unified clock, performs exception detection and fault handling according to a telemetering flight outline, performs regularization processing on original data based on a zero mean unit variance method, and performs wild value elimination on the regularized data by using a Lett criterion to obtain regularized preprocessed data.
It will be appreciated that the apparatus 10 of an embodiment of the present invention comprises: the preprocessing module is used for completing data acquisition, abnormity detection, regularization processing, wild value elimination and the like; an analysis module 100 for performing time series analysis; a feature processing module 200, configured to complete feature construction, feature extraction, and the like; a clustering module 300, configured to complete K-means dynamic clustering of feature data; the classification module 400 is configured to perform first-level SVM linear classification, second-level linear regression classification, second-level autocorrelation characteristic analysis classification, and the like on the feature data.
Further, in one embodiment of the present invention, the predetermined model is an ARMA (p, q) model, and the plurality of basic feature classes include a steady class, a linear class, a periodic class, and a random class.
Further, in an embodiment of the present invention, the feature processing module 200 is further configured to construct high-order statistics of the stationary class, the linear class, the periodic class and the random class by using a sliding window, and build a multi-class multi-feature data classification mathematical model by combining the original data.
Further, in an embodiment of the present invention, the clustering module 300 is further configured to identify a steady-state slow variation class and a dynamic fast variation class by a K-means clustering method, and mark a first-level class identifier on the sample sequence.
The aircraft telemetry data feature extraction and classification device 10 will be described in detail below with reference to fig. 19.
As shown in fig. 19, the system of the device for extracting and classifying the aircraft telemetry data features based on machine learning mainly comprises a preprocessing module, an analysis module, a feature processing module and a clustering module, wherein the work of the analysis and clustering stage is mainly completed by the preprocessing module, the analysis module, the feature processing module and the clustering module, and the work of the classification and identification stage is mainly completed by the preprocessing module, the analysis module, the feature processing module and the classification module.
1. Pre-processing module
The module is used for completing data acquisition, abnormal detection, regularization processing, wild value elimination and the like, and comprises the following steps:
a data acquisition unit: acquiring a binary source code of telemetering data through an aircraft-mounted measuring device, performing down-sampling or interpolation to ensure the consistency of time sequence lengths, and completing the numerical value conversion from the binary source code to physical quantity according to a telemetering processing method;
an abnormality detection unit: comparing the current telemetering data according to an anomaly detection standard specified by the telemetering data flight outline, and switching to a fault handling process if an anomaly is found;
a regularization processing unit: carrying out regularization processing on the original data based on a zero mean unit variance method;
wild value elimination unit: and performing wild value elimination on the regularized data by using a Laplace criterion.
2. Analysis module
The module is used for completing time series analysis, and comprises: and (3) carrying out time sequence analysis on the measured data based on an ARMA (p, q) model to obtain a basic characteristic class which can cover the behavior characteristics of the telemetering data in the aircraft measurement and control field, and solving the ARMA model parameters under the condition that the mean square error RMSE (mean square error) value of various telemetering parameter prediction data and the original data is minimum.
3. Feature processing module
The module is used for completing feature construction, feature extraction and the like, and comprises the following steps:
a feature construction unit: and giving high-order statistics (1 order to 4 orders) of each basic feature class based on a fixed-size sliding window, and constructing a multi-class multi-dimensional feature time sequence by combining original data to be used as an input sample of the feature extraction unit.
A feature extraction unit: and extracting the feature dimension with the maximum weight in various types of data, namely a variance sequence (2-order statistic) through a RELIEF-F algorithm, and taking the feature dimension as a time sequence sample for dynamic clustering and hierarchical classification.
4. Clustering module
The module is used for completing dynamic clustering of feature data, and comprises the following steps: two first-level categories, namely a steady-state slow change category (steady data and linear data) and a dynamic quick change category (random data and periodic data), are identified by a K-means clustering method.
5. Classification module
The module is used for completing the first-level SVM linear classification, the second-level linear regression classification, the second-level autocorrelation characteristic analysis classification and the like of the feature data, and comprises the following steps:
a first-stage classifier: a linear Support Vector Machine (SVM) is adopted to realize the first-stage classification of data, and two first-stage classification results of a steady-state slow change class and a dynamic speed change class are obtained.
A secondary classifier 1: and (3) finishing secondary classification of the steady-state slowly-varying data by adopting a linear regression classifier, namely performing linear regression on the characteristic data to solve to obtain the slope and intercept of a fitting curve, and setting a comparison and judgment threshold value to perform secondary classification on the steady-state slowly-varying data obtained by the primary classification so as to realize classification of the steady data and the linear data.
A secondary classifier 2: and (2) finishing secondary classification of the dynamic variable data by adopting an autocorrelation characteristic analysis classifier, namely utilizing the unique autocorrelation peak characteristic of a random sequence, solving and obtaining the ratio of the 'peak-average difference' and 'peak-valley difference' of an autocorrelation curve by carrying out autocorrelation analysis on the characteristic data, and setting a comparison threshold value according to the ratio to carry out secondary classification on the dynamic variable data obtained by primary classification so as to realize the classification of periodic data and random data.
It should be noted that the foregoing explanation of the embodiment of the method for extracting and classifying aircraft telemetry data features is also applicable to the apparatus for extracting and classifying aircraft telemetry data features of this embodiment, and is not repeated herein.
According to the device for extracting and classifying the characteristics of the telemetering data of the aircraft provided by the embodiment of the invention, the basic category division criterion is firstly obtained through time sequence analysis, the characteristic vector of the telemetering data is constructed through high-order statistic analysis and characteristic extraction algorithm, the clustering analysis is carried out on the characteristic vector of the telemetering data by adopting the unsupervised learning method, and the classification is carried out on the characteristic vector of the telemetering data by adopting the supervised learning classification algorithm, so that the characteristic extraction and the effective classification of the telemetering data of the aircraft can be realized, the classification result covers the main category of the behavior characteristics of the telemetering data in the field of aircraft measurement and control, the classifier parameters can be used as the input conditions of compressed transmission bandwidth or other signal processing methods, the technical support can be provided for the future deep space detection task and the development of a spatial information network in China, and, The method has the characteristics of high instantaneity, high classification success rate and the like, and can provide important technical parameters for telemetering data prediction and compression.
In the description of the present invention, it should be understood that the terms "K-means clustering," "support vector machine classifier," "linear regression classifier," "autocorrelation characteristic analysis classifier," and the like, are intended to mean an embodiment of a machine learning method for hierarchical classification of aircraft telemetry data, and are used only for explanation of the present invention, and should not be construed as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.