CN117216565A - Multi-channel characteristic coding redundant positioning characteristic data set construction method - Google Patents
Multi-channel characteristic coding redundant positioning characteristic data set construction method Download PDFInfo
- Publication number
- CN117216565A CN117216565A CN202311186227.5A CN202311186227A CN117216565A CN 117216565 A CN117216565 A CN 117216565A CN 202311186227 A CN202311186227 A CN 202311186227A CN 117216565 A CN117216565 A CN 117216565A
- Authority
- CN
- China
- Prior art keywords
- feature
- signal
- channel
- redundant
- signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 129
- 239000013598 vector Substances 0.000 claims abstract description 55
- 238000000605 extraction Methods 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 238000002474 experimental method Methods 0.000 claims description 23
- 239000002245 particle Substances 0.000 claims description 10
- 238000001228 spectrum Methods 0.000 claims description 7
- 230000001174 ascending effect Effects 0.000 claims description 5
- 238000011896 sensitive detection Methods 0.000 claims description 4
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 27
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000007637 random forest analysis Methods 0.000 description 90
- 239000012465 retentate Substances 0.000 description 37
- 230000000875 corresponding effect Effects 0.000 description 26
- 238000005457 optimization Methods 0.000 description 25
- 230000004807 localization Effects 0.000 description 24
- 238000012360 testing method Methods 0.000 description 23
- 230000004927 fusion Effects 0.000 description 22
- 238000002372 labelling Methods 0.000 description 21
- 238000012549 training Methods 0.000 description 16
- 230000008901 benefit Effects 0.000 description 12
- 238000011160 research Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 239000000463 material Substances 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000009826 distribution Methods 0.000 description 6
- 230000007423 decrease Effects 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 238000007635 classification algorithm Methods 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 4
- 230000005284 excitation Effects 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 3
- 230000003750 conditioning effect Effects 0.000 description 3
- 238000009432 framing Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000011425 standardization method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 238000003466 welding Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Testing Or Calibration Of Command Recording Devices (AREA)
Abstract
A method for constructing a multi-channel characteristic coded redundant positioning characteristic data set belongs to the technical field of aerospace electronic equipment detection. In order to solve the problem that the quality of a data set is affected due to neglecting the attribute of the data set and the correlation between the attributes of the data set in the current single-channel feature extraction construction mode in the aerospace electronic equipment detection technology. The invention collects the redundant signals by using N sensors arranged on the aerospace equipment model, and performs useful pulse extraction and pulse matching; identifying N sensors from near to far to collect corresponding redundant signals in the redundant signals of N channels during useful pulse extraction, and obtaining sequential characteristic values; for the useful pulses subjected to pulse matching, M signal characteristic values are calculated on each group of N useful pulses in sequence; and constructing a multidimensional feature vector based on the M signal feature values and the N sequence feature values, and adding a label with a corresponding number according to the closed space where the redundancy is positioned, thereby constructing a feature data set.
Description
Technical Field
The invention belongs to the technical field of aerospace electronic equipment detection, and particularly relates to a method for constructing a redundant positioning characteristic data set
Background
The aerospace device refers to an electronic device with a closed structure having electromagnetic shielding and sealing waterproof functions, such as an aerospace power supply, a rocket engine, a satellite-borne electronic stand-alone machine and the like. The assembly process of the aerospace device is complex, and particles such as metal scraps, welding residues, wire segments and the like can be possibly packaged in the aerospace device in each link. These externally introduced or internally generated particles that disrupt the original stable physical state inside the object are referred to as the retentate. Aerospace devices typically operate in an environment of excessive weightlessness, severe impact and vibration, and the surplus is thus in random motion. They can collide with and damage internal components of the aerospace device, or adhere to the surface of the circuit causing short circuits, open circuits, or slip causing intense static and electromagnetic interference. These can have significant impact on the normal operation of the aerospace device and even cause significant aerospace incidents. Therefore, in the aerospace field, the detection of excess is of paramount importance.
The Particle Impact Noise Detection (PIND) method is a widely used method for detecting redundant substances worldwide at present, and the basic principle thereof is as follows. The object to be measured is fixed to the vibrating table, and a mechanical excitation is generated by the vibrating table and applied to the object to be measured, so that superfluous objects possibly existing inside are in a random motion state and a signal is generated, which is called superfluous object signal. The sensor is fixed on the surface of the object to be detected to capture the possibly generated redundant signals, and the signals are transmitted to the signal conditioning circuit for processing and finally output to the loudspeaker or the oscilloscope. Whether the object to be measured has redundancy is judged by listening to whether the loudspeaker generates sound or checking whether the oscillograph displays waveforms. It can be seen that the core of the PIND method is to determine whether a redundant signal is generated, and the redundant signal is not deeply analyzed. In fact, the redundancy signal contains abundant information about the properties of the redundancy itself, such as position, material, weight, etc. This information has important reference value for the work of detecting and cleaning the surplus. In particular, in large or huge aerospace equipment, the redundant object position information can provide targeted cleaning suggestions for detection personnel, so that the working efficiency is greatly improved. Therefore, the redundancy localization study is an important component of the redundancy detection study.
In a broad sense, the unwanted signal is a collision signal or a slip signal, belonging to the acoustic emission signal. The time difference positioning method and the area positioning method are commonly used acoustic emission source positioning methods. However, the space equipment has complex internal structure and uneven composition materials, and the two methods are difficult to obtain stable and reliable acoustic emission source positioning effect, namely, redundant positioning effect. In recent years, machine learning methods have been widely used in various fields, such as the field of fault diagnosis. The fault source is like a redundancy, is considered as an acoustic emission source, and the fault signal is like a redundancy signal, is considered as an acoustic emission signal. Thus, fault source localization or fault type diagnostic studies may provide a reference for redundancy localization studies. The signal characteristic difference exists between different types of fault signals, and the characteristic engineering can be used for quantifying the signal characteristic difference and establishing a characteristic data set, so that the classifier can be trained to realize data classification of different labels, and further realize classification of different types of fault signals. Likewise, there is a difference in signal characteristics between the redundancy signals generated by the redundancy at different positions, and the positioning of the redundancy can be achieved by training the applicable classifier through similar processing steps.
Typically, in one-dimensional linear positioning, two sensors are used to determine the position of the target source, in two-dimensional planar positioning, three sensors are used to determine the position of the target source, and in three-dimensional spatial positioning, four sensors are used to determine the position of the target source. Therefore, in the positioning of the redundancy of a three-dimensional configuration of a spacecraft, four sensors are required to construct an array to be arranged on the surface of the spacecraft to completely capture the redundancy signals generated at various positions inside the spacecraft. In machine learning, values of time domain features can be calculated on a sample point basis for a segment of a digital signal. The digital signal is subjected to frequency domain transformation, and the numerical value of the frequency domain characteristic corresponding to the sampling point can be calculated. More values of the multi-domain feature can be obtained using other signal processing means. Feature vectors are constructed by sequentially arranging feature values calculated by single sampling points or fixed combinations of a plurality of sampling points. By setting a proper label to the feature vector, the tagged feature data can be obtained and a tagged feature data set can be established. As previously described, in the positioning of the redundancy, four sensors are used simultaneously to capture the redundancy signal generated at the same time, resulting in four correlated digital signals. In these four signals, most of the signal components are identical, and their amplitude, start time, etc. domain characteristics differ due to the different distances from the redundancy. Only a small fraction of the signal components are different, the signal components at the beginning or end of which also differ with respect to the distance from the retentate.
In the signal classification research applying the machine learning method, in the data acquisition link, a single experiment acquires a section of independent signals, which are called single-channel signals. And in the subsequent characteristic data set establishment stage, carrying out characteristic numerical value calculation and label setting by taking each section of independently acquired signal as a unit. In the research of positioning the redundant substances, four sections of related signals are collected in a single experiment in the data acquisition link, and the four sections of related signals are called four-channel redundant signals. Thus, in the subsequent feature data set construction stage, if four signals are regarded as independent individuals which are not related, feature value calculation and tag setting are performed in units of each signal. This will discard the correlation between four segments of signals and a single segment of signal can only be used independently, one-sided to describe experimental details. This results in a feature dataset constructed to contain only a small amount of valid redundancy location information, and thus the classification performance of the trained classifier is generic. And if four signals acquired simultaneously are regarded as a combination, the feature value calculation and the tag setting are performed in units of the combination. This will preserve all the details of a single experiment completely, build a feature dataset containing complete redundancy position information, and thereby train a classifier with good classification performance.
Most of researches at present only relate to construction of single-channel characteristic data, particularly the construction of multi-channel characteristic data is rarely focused on by the existing supervised learning method, more, the construction of single-channel characteristic data, particularly the construction of characteristic data based on multi-channel signals, and few researches are related, so that in the detection process of avionic electronic equipment, the characteristic expression contained in identification data is severely limited, and the classification effect of a classifier is limited. In some prior art, it is also tried to extract various characteristics of the signal to enhance the classification effect of the classifier, but the signal is collected in the data acquisition stage, if it is converted into an image such as a time-frequency chart, a spectrogram, etc., and then the image is input into the characteristic extractor, and the signal is not directly processed on the basis of the signal. The signal must lose a part of the information during the conversion into an image, and the quality of the feature data constructed on the basis of the image is lower than the quality of the feature data constructed directly on the basis of the signal. While some students have put more effort on the design of feature extractors, including drawing attention mechanisms, improving the structure of feature extractors, etc., in an effort to more quickly and accurately extract suitable features from multi-channel multi-dimensional media files to construct high quality feature data. But rather ignores the concern over the multi-channel multi-dimensional media file itself. And perhaps analyzing the respective attributes and the correlations among the multi-channel multi-dimensional media files, extracting some targeted features on the basis, wherein the quality of the constructed feature data is higher, and the feature extraction process is more evidence and stable.
Disclosure of Invention
The invention aims to solve the problem that the existing method for constructing the data set based on single-channel feature extraction in the detection technology of the aerospace electronic equipment ignores the attribute and the correlation among the attributes, so that the quality of the data set is influenced, the classification performance of the classifier trained on the feature data set is influenced, and the classification effect of the feature data is finally influenced.
The method for constructing the redundant object positioning characteristic data set of the multi-channel characteristic code comprises the following steps:
s1, aiming at a space flight equipment model for detecting particle collision noise, acquiring redundant signals by using N sensors arranged on the space flight equipment model, and regarding N sensors serving as N channels as N encoders to obtain redundant signals corresponding to the N channels respectively;
s2, extracting useful pulses from redundant signals corresponding to N encoders respectively, and completing pulse matching based on the number of the useful pulses of the N encoders and the starting time; processing N channel redundancy signals acquired in a single mode respectively during useful pulse extraction to acquire peak time of first useful pulse in the N channel redundancy signals; the method comprises the steps of ascending sequence sorting N peak moments, identifying N sensors from near to far in N channel surplus signals to collect corresponding surplus signals, and taking the value of the sorting sequence as a characteristic value, namely the characteristic value used for representing the near-far sequence of the N channel surplus signals collected before from the surplus, so as to obtain N sequence characteristic values corresponding to N channels;
For the useful pulses subjected to pulse matching, calculating the numerical value of a multi-domain feature on each group of N useful pulses in sequence by taking a group of N useful pulses sequentially appearing in the N matched redundant signals as a unit, wherein the multi-domain feature comprises a time domain feature and a frequency domain feature; setting that the multi-domain feature corresponding to each section of signal of the redundancy signal of each channel comprises M multi-domain features, and marking each multi-domain feature in the M multi-domain features as 1 signal feature value;
constructing a multidimensional feature vector based on the M signal feature values and the N sequence feature values;
adding labels with corresponding numbers to the feature vectors by determining in which numbered enclosed space the current N channel redundant signals are generated;
s3, processing N channel redundant signals generated by multiple experiments based on the steps S1 and S2 to obtain different characteristic data of multiple labels, and constructing a characteristic data set.
Further, the space inside the space vehicle model for detecting the particle impact noise in S1 is divided into a plurality of closed spaces, each of the closed spaces is numbered, and when a data set is constructed, redundant samples are respectively arranged in different closed spaces and particle impact noise detection is performed, so that redundant signals are obtained.
Further, in order to collect signals, when N sensors are arranged on the aerospace equipment model, the mass center of the aerospace equipment model is taken as a reference, the N sensors are arranged at positions far away from the mass center as far as possible, and the distance between the sensors and the mass center is ensured to be in the sensitive detection range of the sensors.
Further, the time domain features include time delay, pulse rise time, pulse symmetry, pulse amplitude, pulse energy, root mean square voltage, zero crossing rate.
Further, the frequency domain features include spectrum centroid, spectrum mean square error, root mean square probability and frequency standard deviation.
Further, in S2, a three-threshold pulse extraction algorithm is used to extract the useful pulse.
Further, the process of completing pulse matching based on the number of useful pulses of the N encoder and the start time comprises the following steps:
s201: in the process of acquiring the peak time of the first useful pulse in the N channel redundancy signals and carrying out ascending order on the N peak time, the peak time of the first useful pulse in the N redundancy signals corresponding to the sequence from the near to the far from the redundancy is respectively expressed as T 1 、T 2 、T 3 、……、T N ;
S202: calculate T N -T 1 、T N -T 2 、……、T N -T N-1 Representing the time difference between arrival of the unwanted signal at the nearest sensor and arrival at the farthest sensor, respectively;
S203: the supplementing time is respectively before the starting time of the redundant signal acquired by the farthest sensor and the nearest, next nearest and third nearest to the redundant signal is T N -T 1 、T N -T 2 、……、T N -T N-1 Is a zero pulse of (2);
s204: and aligning the starting moments of the N new redundant signals, taking the length of the redundant signal acquired by the sensor farthest from the redundant as a reference, respectively intercepting the signal components with the same length from the starting moment in other N-1 redundant signals, reserving, and discarding the redundant signal components.
Further, the process of constructing a multidimensional feature vector based on the M signal feature values and the N sequential feature values includes the steps of:
for each signal feature M in the M signal features, calculating a feature mean value of the feature M corresponding to the matched group of N useful pulses, wherein the feature mean value is used as a signal new feature value of the signal feature M, and a basic feature vector formed by the M signal new feature values;
adding N bits of codes behind the basic feature vector, and further constructing an M+N-dimensional feature vector, namely the obtained multidimensional feature vector;
the N-bit codes are N sequential characteristic values, and each 1 characteristic value corresponds to a characteristic value used for representing the near-far sequence of the N channel redundant signals acquired before from the redundant in the pulse matching process.
Or,
for the useful pulse subjected to pulse matching, setting M1 time domain features and M2 frequency domain features in M signal features, sequentially putting together characteristic values of N times M1 time domain features and sequentially putting together characteristic values of N times M2 frequency domain features for M signal features corresponding to each redundant signal in N redundant signals; in this way, the n×m feature values calculated by the useful pulse group corresponding to each group in the N redundant signals are arranged in a row, and a feature vector is constructed as a basic feature vector;
adding N bits of codes behind the basic feature vector, and further constructing an N-dimension feature vector of N+M, namely the obtained multidimensional feature vector;
the N-bit codes are N sequential characteristic values, and each 1 characteristic value corresponds to a characteristic value used for representing the near-far sequence of the N channel redundant signals acquired before from the redundant in the pulse matching process.
The beneficial effects are that:
the invention provides a method for constructing a characteristic data set of multichannel characteristic codes, which directly takes an excessive signal as a research object, extracts characteristics on the basis of the excessive signal, constructs characteristic data and further constructs the characteristic data set. The method effectively avoids the problem of information loss in the process of converting signals into images in the prior study. The quality of the established characteristic data set is higher, so that the classification performance of the trained classifier is better, and the positioning effect of the redundant objects is better. The multi-channel characteristic of the redundant signals acquired by a single experiment is fully considered. When the feature data is constructed, it does not perform feature calculation and tag setting in units of each segment of the redundant signal, but performs feature calculation and tag setting in units of a combination of four segments of the redundant signal. Specifically, instead of conventionally constructing four feature vectors and setting a tag separately, feature values calculated by combining a single sampling point or a fixed plurality of sampling points corresponding in time within a plurality of channels are combined together to construct one feature vector and set a tag. This effectively preserves the correlation between the multi-channel redundancy signals. More importantly, the coding concept in the field of computer programming is introduced for the first time into the study of feature data construction, and is used for quantifying the correlation between multi-channel redundancy signals, and giving a specific combination mode of four feature vectors constructed in multiple channels. Meanwhile, compared with a combination mode that four feature vectors are directly arranged or connected to form one feature vector, the dimension of the combined feature vector can be reduced by coding, and time loss is greatly reduced. In a comprehensive view, the multi-channel characteristic data set construction method provided by the invention can simultaneously achieve the purposes of improving the quality of the data set and improving the calculation efficiency, ensure the superiority and stability of the classification performance of the classifier and realize reliable positioning of the redundant objects.
Drawings
FIG. 1 is a system for positioning a surplus.
Fig. 2 is a spatial layout of the sensor.
Fig. 3 is a schematic diagram of a three threshold pulse extraction algorithm.
Fig. 4 is a design flow of a conventional feature dataset construction method.
FIG. 5 is a design flow of a method for constructing a feature dataset for channel labeling.
Fig. 6 is a schematic diagram of a pulse matching algorithm based on the short-plate principle.
FIG. 7 is a design flow of a method for constructing a feature dataset for multi-channel fusion.
Fig. 8 is a design flow of a method for constructing a feature data set of multi-channel property encoding.
FIG. 9 is a schematic diagram of the enclosed space division of a model of an aerospace device.
FIG. 10 is a graph comparing Precision, recall and F1 score obtained from random forests (conventional positioning dataset, labeled positioning dataset, fused positioning dataset, coded positioning dataset).
FIG. 11 is a diagram of classification accuracy achieved by a random forest trained to retain a set of positioning data for different dimensional feature data or sets of feature data.
FIG. 12 shows the classification accuracy obtained by the random forest when n_optimators take different values.
Fig. 13 is a ten fold cross-validation result of a parameter optimized random forest.
Detailed Description
The present embodiment is a method for constructing a multi-channel feature-coded redundancy positioning feature data set, and based on the PIND method, a redundancy positioning test system as shown in fig. 1 is constructed. It comprises three parts. The left part is a hardware platform of the PIND method, the middle part is an automatic redundant detection device, and the right part is algorithm processing finished on an upper computer. On the basis, a data set is constructed and tested, and the specific construction process is as follows:
1. Positioning redundant objects:
step one: and designing a model of the aerospace equipment according to the shape, structure, material and other properties of the aerospace equipment to be tested. The interior space of the aerospace device model is divided into a plurality of enclosed spaces, each enclosed space being numbered. A sample of the surplus material prepared in advance is selected to be placed in the sealed space with the number 1.
Step two: fixing the aerospace equipment model containing the redundancy samples to a hardware platform of the PIND method, driving a vibrating table to apply mechanical excitation to the aerospace equipment model, and activating the redundancy samples to be in a random motion state to generate redundancy signals.
Step three: four sensors placed at different positions on the surface of a model of the aerospace equipment capture redundant signals, and the redundant signals are transmitted to the redundant automatic detection equipment for processing through a data transmission line with high shielding performance after being converted into electric signals, and the method comprises the following steps: signal conditioning, signal filtering, synchronous acquisition and the like. The four-channel redundancy signal is stored as a file, namely a digital signal after processing.
Step four: and respectively carrying out framing treatment on the four-channel redundancy signals to obtain a plurality of frame signals. And processing the four-channel redundancy signal by using a three-threshold pulse extraction algorithm to obtain the number of useful pulses in each redundancy signal, and the position of a starting frame signal and the number of frame signals contained in each useful pulse.
Step five: values of multi-domain features are calculated from each frame signal, the multi-domain features including time domain features and frequency domain features, and a multi-dimensional feature vector is constructed. The plurality of frame signals may construct a plurality of feature vectors. And setting the labels of the constructed multiple feature vectors to be the same number according to the number of the closed space in which the redundant sample is currently placed, so as to obtain positioning data of the multiple strip labels.
Step six: and (3) adjusting the closed space for placing the redundant samples in the first step, namely sequentially placing the redundant samples into the closed spaces with the numbers of 2,3, … … and n. And (5) repeating the second step to the fifth step to obtain a plurality of pieces of positioning data with labels of 2,3, … … and n. And (3) by repeating the implementation steps, a large amount of positioning data are obtained, and a positioning data set representing different closed spaces of the redundant sample in the space equipment model is established. And performing feature optimization processing on the positioning data set to obtain a high-quality positioning data set.
Step seven: classifiers based on different machine learning classification algorithms, called positioning models, are trained on the positioning dataset. The optimal performance is obtained through comparison, the parameters of the optimal performance are optimized through a grid search method, the optimal parameter combination is obtained, and then the optimal positioning model is obtained, so that the method can be used for positioning test of the aerospace equipment to be tested.
Step eight: the method comprises the steps of fixing the aerospace equipment to be tested to a hardware platform of a PIND method, driving a vibrating table to apply mechanical excitation to the aerospace equipment, and activating the redundancy at an unknown position in the aerospace equipment to be tested to enable the redundancy to be in a random motion state so as to generate redundancy signals. Repeating the partial operations of the third step, the fourth step and the fifth step to construct a plurality of feature vectors. Their labels are unknown and are referred to as positioning data to be measured. And (3) predicting the labels of the positioning data to be detected by using the optimal positioning model in the step seven, and performing majority voting on a plurality of predicted labels to obtain common labels of the predicted labels. The number of the closed space corresponding to the common label is the predicted position of the redundant material in the space equipment to be tested, and the positioning test is finished.
The core of the redundancy localization method is seen in the two transformations of the study object. The first research object is an aerospace equipment model, and the final purpose is to obtain an optimal positioning model corresponding to the steps one to seven of the specific implementation steps. The second research object is the space equipment to be tested, and the final purpose is to obtain the redundant positioning test result of the space equipment to be tested corresponding to the step eight of the specific implementation step. The high correlation of the attributes between the aerospace device model and the aerospace device to be tested ensures the stability of the generalization performance of the optimal positioning model in positioning test. The reliability of the positioning test results depends on the classification performance of the optimal positioning model. The better the classification performance, the more reliable the positioning test results. In machine learning, factors that affect classifier classification performance mainly include feature data sets, classification algorithm selection, and parameter optimization. The selection of the classification algorithm depends on the problem to be solved at present, and the parameter optimization can only improve limited classification performance. The characteristic data set is a source for training the classifier and plays a key role. The better the quality of the feature data set, the clearer the basis of the classification algorithm selection, and the better the classification performance of the trained classifier. Therefore, the focus of the study of the present invention lies in the step five and the step six of the implementation steps. A brief description will therefore be given of the preparation of the preamble for constructing the feature dataset, including the selection and layout of the sensors that capture the redundancy signal, the three-threshold pulse extraction algorithm that processes the redundancy signal to obtain useful pulses, and the multi-domain feature extraction that constructs feature vectors on the frame signal.
2. Sensor type and layout:
the sensor is connected with the automatic redundant detection equipment through a data transmission line with high shielding performance. The sound signals captured by the sensor are converted into electric signals, the electric signals are input into the automatic detection equipment for the surplus, and the processing such as signal conditioning, signal filtering, synchronous acquisition and the like is completed and stored as digital signals. Thus, the performance and layout of the sensor directly determines the quality of the captured acoustic signal. Selection of appropriate sensors and layouts in the art may be effective in capturing the space device and the internally generated redundancy signals of the space device model.
The frequency of the redundant signal is mainly concentrated between 20kHz and 100kHz, the amplitude is smaller, the minimum frequency is even up to the microvolts, and the maximum frequency is not more than tens of millivolts. Resonant sensors have a narrower response band, but higher sensitivity than broadband sensors. Therefore, from the standpoint of detection sensitivity and optimal utilization, the invention uses a PXR 04-type resonant sensor to connect to the redundancy automatic detection device. The resonance frequency of the PXR04 type resonant sensor is 40kHz, the frequency bandwidth at the position with the sensitivity of 60dB is 15kHz to 165kHz, and the performance requirement of redundant detection on the sensor is met.
In practice the invention can use N sensors to collect the retentate signal for data set construction. As described above, in view of the three-dimensional structure of both the spacecraft model and the spacecraft, it is generally necessary to use four or more sensors, and this embodiment will be described by taking four sensors as an example. The space layout of the multiple sensors needs to consider the effective detection range of the single sensor, ensure that the effective detection range of all the sensors covers the whole detection object, also consider the difference of the reserved acoustic emission signals reaching different sensors, and ensure that the position information of enough acoustic emission sources can be transmitted. Fig. 2 shows the spatial layout employed by the present invention. The invention takes the mass center of the spaceflight equipment or the spaceflight equipment model as a reference, and selects the position farthest from the mass center to place the sensor. The four sensors can be regarded as two combinations, the two sensors in each combination are in a straight line with the mass center, and the planes of the two combinations are vertically intersected, so that the four sensors are guaranteed to detect signals of the aerospace device or the aerospace device model in different space dimensions.
The maximum side length of a typical spacecraft is between 0.5m and 0.8m, in the extreme case 80cm x 80cm. At this time, the distance between the two points farthest from each other in the space device is This furthest distance is also twice the distance from the centroid to the position furthest from the centroid. The sensitive detection range of the PXR04 type resonant sensor is 80cm. Thus, if the PXR 04-type resonant sensors are placed furthest from the centroid, their sensitive detection range is over the centroid. And the four PXR04 type resonant sensors can be placed to ensure that the whole aerospace equipment is covered. It should be noted that, for larger-sized aerospace devices, it is possible to ensure that the detection range covers the entire aerospace device by providing more PXR 04-type resonant sensors or selecting new applicable sensors.
3. Three-threshold pulse extraction algorithm
As previously mentioned, the redundancy signal is generated when the redundancy is in random motion, and is internally composed of pulses of relatively large amplitude that oscillate continuously, known as useful pulses. Whereas electromagnetic interference or background noise that may be present in the unwanted signal is a pulse of small amplitude oscillating in short time, called an interference pulse. The characteristic data is obtained by calculating the characteristic value on the useful pulse, and therefore, the useful pulse needs to be extracted from the redundant signal, and the interference pulse is abandoned. And processing by adopting a three-threshold pulse extraction algorithm according to the characteristics of the pulses in the redundant signal and the noise signal. Fig. 3 presents a schematic view of the algorithm, which is implemented as follows.
Step one: calculating average energy of whole-segment redundancy signalThe amount, denoted E mean Referred to as a baseline threshold, further determines a spike detection threshold E peak And endpoint detection threshold E hs . These three thresholds are also referred to as three thresholds in the algorithm name. In the redundancy detection study, the spike detection threshold is set to E peak =3E mean The endpoint detection threshold is set to E hs =1.1E mean 。
Step two: performing first framing processing on the redundant signals, and setting the duration delta t of each frame signal 1 For 100 mus and the energy of each frame signal is calculated.
Step three: starting from the first frame signal, sequentially combining the energy of each frame signal with E peak Comparison was performed. When the energy of a certain frame signal is greater than E peak Representing the highest point in its vicinity where the current useful pulse would occur. Starting from this frame signal, the energy of each subsequent frame signal is successively combined with E peak Comparing until the energy of a frame signal is less than E peak . From these frame signals, the one with the greatest energy is found, which is identified as the highest point of the currently useful pulse. The time at which this frame signal is acquired is identified as the peak time of the current useful pulse, denoted t max 。
Step four: performing second framing processing on the redundant signals, and setting the duration delta t of each frame signal 2 The energy of each frame signal was recalculated for 50 mus.
Step five: at the peak time t of the current useful pulse max Starting from, the energy of each frame signal is respectively forward and backward correlated with E hs Comparing until the energy of a frame signal is found to be less than E in two directions hs . The previous frame signal of these two frame signals is identified as the start frame signal and the end frame signal of the current useful pulse, respectively. Their times are respectively identified as the start and end of the current useful pulse, denoted t begin And t end 。
Step six: re-using the termination time t of the currently searched useful pulse end Starting from the next frame signal of (2)And (3) repeating the third step, the fifth step and the third step, and extracting a second useful pulse. And so on until the last frame signal of the whole segment of the redundancy signal is searched. To this end. The useful pulses in the whole length of the unwanted signal are extracted in their entirety.
4. Multi-domain feature extraction:
in machine learning, a classifier cannot directly input a signal to classify, and needs to convert the signal into interpretable data. Feature extraction is a common transformation means in the field of signal processing. In the invention, feature extraction refers to calculating the numerical value of multi-domain features on the basis of useful pulses and constructing feature vectors, thereby obtaining feature data. Specifically, each useful pulse includes a plurality of sampling points, the values of the time domain features are calculated on the plurality of sampling points, and frequency domain transformation is performed on the values of the time domain features, so that the values of the frequency domain features of the plurality of sampling points are calculated.
The invention mainly considers three aspects of time characteristics, energy characteristics and pulse zero crossing rate to extract time domain characteristics from the redundant signals. The different distances the retentate signal travels in the medium take different time, so the time characteristic is an important characteristic characterizing the position of the retentate. Typical time characteristics include time delay, pulse rise time, and pulse symmetry. As the distance traveled by the unwanted signal in the medium increases, the signal energy decreases and the corresponding amplitude decreases. The loss of signal energy is greater when the signal encounters the interface of two media. Typical energy characteristics include amplitude and energy. During the propagation of the redundancy signal, the decay rates of the redundancy signals of different frequencies are different. This brings about a difference in the zero crossing rate of the unwanted signals acquired by the different sensors. Therefore, zero crossing rate is also an important time domain feature. In addition, the invention mainly selects four frequency domain characteristics of spectrum centroid, spectrum mean square error, root mean square probability and frequency standard deviation. As mentioned above, the unwanted signal is attenuated during propagation. The decay rate of the unwanted signals at different frequencies is different due to the viscoelastic effect of the medium. Therefore, the frequency domain features have important reference significance. A specific description of the above time and frequency domain features is shown in table 1.
TABLE 1 detailed description of Multi-Domain features
5. Data set construction:
A. traditional feature data set construction methods (methods used in step four and step five):
the spacecraft model or spacecraft to be tested is fixed to the hardware platform of the PIND method as previously described, and four sensors are placed on its surface to capture the redundancy signal generated by a single experiment. Thus, four-segment redundancy signals, called four-channel redundancy signals, can be saved in a single experiment. In the traditional feature data set construction method of machine learning, basically, four sections of redundant signals are regarded as independent individuals, and feature extraction is carried out by taking each section of signals as a unit to construct feature data.
In particular, a three-threshold pulse extraction algorithm is used to process each segment of the unwanted signal, with the useful pulse in each segment of the signal being extracted. The values of the eleven multi-domain features shown in table 1 are calculated on the useful pulse of each segment of the signal in turn, on a per segment signal basis. The eleven feature values calculated by each useful pulse are arranged in a row to construct a feature vector. And adding a label with a corresponding number to the feature vector by determining in which numbered closed space the redundant signal of the current useful pulse is generated. In this way, the useful pulse in each segment of the redundancy signal is ultimately converted into tagged data. Fig. 4 shows a design flow of a conventional feature dataset construction method. As shown in fig. 4, the segment of the redundancy signal contains 14 useful pulses, and 14 pieces of characteristic data can be constructed. By the method, four-channel redundant signals generated by multiple experiments can be respectively processed to obtain different characteristic data of multiple labels, and a characteristic data set is constructed.
B. Method for constructing characteristic data set of channel annotation
The invention provides a method for constructing a characteristic data set of channel labeling. In the method, the four sections of redundant signals are regarded as independent individuals, and feature extraction is carried out by taking each section of signal as a unit to construct feature data. However, compared with the characteristic data constructed in the A, the characteristic data constructed by the method has a series of channel marked characteristic values.
In particular, a three-threshold pulse extraction algorithm is used to process each segment of the unwanted signal, with the useful pulse in each segment of the signal being extracted. First, the starting time of the first useful pulse in each segment of the redundancy signal is acquired respectively. The smaller the start time of the first useful pulse, the more the signal representing the presence of the retentate is captured by the sensor closer to the retentate. Thereby, a redundancy signal captured by the far and near sensors from the redundancy is obtained. The sensor closest to the redundancy is marked with a "1" redundancy signal addition channel, the sensor closest to the redundancy is marked with a "2" redundancy signal addition channel, the sensor closest to the redundancy is marked with a "3" redundancy signal addition channel, and the sensor farthest from the redundancy is marked with a "4" redundancy signal addition channel.
The values of the eleven multi-domain features shown in table 1 are calculated on the useful pulse of each segment of the signal in turn, on a per segment signal basis. Eleven eigenvalues calculated for each useful pulse are arranged in a row, and the channel label of the retentate signal where the current useful pulse is located is added as the twelfth eigenvalue to construct an eigenvector. And adding a label with a corresponding number to the feature vector by determining in which numbered closed space the redundant signal of the current useful pulse is generated. In this way, the useful pulse in each segment of the redundancy signal is ultimately converted into tagged data. Fig. 5 shows a design flow of a method for constructing a feature dataset for channel labeling. As shown in the figure, the sensor nearest to the redundancy captures a redundancy signal containing 14 useful pulses, 14 pieces of characteristic data can be constructed. 13 pieces of characteristic data can be constructed by including 13 useful pulses in the redundant signal captured by the sensor closest to the redundant signal. The sensor third closest to the redundancy captures a redundancy signal containing 12 useful pulses, 12 pieces of characteristic data can be constructed. The sensor furthest from the redundancy captures a signal of the redundancy comprising 12 useful pulses, 12 pieces of characteristic data can be constructed. Thus, a total of 51 pieces of characteristic data can be constructed by four-channel redundant signals acquired by a single experiment. By the method, four-channel redundant signals generated by multiple experiments can be respectively processed to obtain different characteristic data of multiple labels, and a characteristic data set is constructed.
Pulse matching algorithm based on short plate principle:
in the A and B, four sections of redundant signals are regarded as independent individuals, and feature extraction is carried out by taking each section of signal as a unit to construct feature data. This effectively cuts off the correlation between the four-way retentate signals collected in a single experiment. And according to the time difference of the acoustic emission signals generated by the same acoustic emission source reaching different sensors and the spatial layout of the sensors, solving a geometric equation set, and further obtaining the spatial position of the acoustic emission source. The method is characterized in that fusion information analysis is carried out on acoustic emission signals received by a plurality of sensors, namely correlation among the acoustic emission signals is mainly considered. Therefore, the correlation between the four-channel redundancy signals also has important reference value, and needs to be emphasized in the construction process of the characteristic data.
In the method, four-channel redundancy signals can be considered together to construct a group of characteristic data for simultaneously representing the four-channel redundancy signals. As described above, the characteristic data is obtained by calculating the characteristic values on the basis of the useful pulses, so that the consideration of combining the four-channel retentate signals together is based on the consideration of combining the useful pulses in the four-channel retentate signals together, and more specifically, the consideration of combining the sequential occurrence of each useful pulse in the four-channel retentate signals together. For example, consider the first useful pulse sequentially appearing in the four-channel retentate signal, respectively, as a combination of four useful pulses. Referring to fig. 5, the present invention finds a problem. During the same acquisition time, the sensor closest to the retentate captures the retentate signal first, and the captured signal component is more, i.e. contains more useful pulses. In fig. 5 the first segment of the unwanted signal contains 14 useful pulses, the second segment of the unwanted signal contains 13 useful pulses, and the third and fourth segments of the unwanted signal contain 12 useful pulses. This gives rise to the problem of mismatch in the number of useful pulses in the four-channel redundancy signal. Thus, when the four-channel redundancy signals are combined together for consideration, after the useful pulse in the redundancy signal acquired by the sensor farthest from the redundancy is fully used, the useful pulse in the redundancy signals acquired by the other three sensors is not used. Thus, there is a need to match the number of useful pulses in a four-channel retentate signal acquired in a single experiment.
The short-plate principle is used to provide a reference for the design of the matching of the number of useful pulses in the unwanted signal. In the four-channel redundancy signal, the number of useful pulses contained in the redundancy signal acquired by the sensor farthest from the redundancy is the smallest, the number of useful pulses contained in the sensor is taken as a reference, the other three sections of redundancy signals are processed, and the number of useful pulses contained in the processed four-channel redundancy signal is ensured to be consistent. The method lays a foundation for the subsequent multi-channel characteristic data construction and research. Specifically, the invention provides a pulse matching algorithm based on a short-plate principle. Fig. 6 shows a schematic diagram of the algorithm, which is implemented as follows:
step one: and respectively processing four-channel residual signals acquired by a single experiment by using a three-threshold pulse extraction algorithm to acquire the peak time of the first useful pulse in the four-channel residual signals. For four peak momentsAnd (5) performing ascending sorting, and identifying four sections of redundant signals which are acquired by four sensors from the near to the far from the redundant in the four sections of redundant signals. The peak time of the first useful pulse in the four redundant signals is respectively expressed as T 1 、T 2 、T 3 And T 4 。
Step two: calculate T 4 -T 1 、T 4 -T 2 And T 4 -T 3 They represent the time difference between arrival of the unwanted signal at the nearest sensor and arrival at the farthest sensor, the time difference between arrival at the next nearest sensor and arrival at the farthest sensor, and the time difference between arrival at the third nearest sensor and arrival at the farthest sensor, respectively.
Step three: the supplementing time length is T before the starting time of the redundant signals acquired by the three sensors closest to the redundant material, next closest to the redundant material and third closest to the redundant material 4 -T 1 、T 4 -T 2 And T 4 -T 3 Is set to zero pulses of (2).
Step four: and aligning the starting moments of the four sections of new redundant signals, taking the length of the redundant signal acquired by the sensor farthest from the redundant as a reference, respectively intercepting the signal components with the same length from the starting moment in the other three sections of redundant signals, reserving, and discarding the redundant signal components. Thus, the number of useful pulses in the four-channel redundancy signal and the starting time are matched.
C. The method for constructing the characteristic data set of the multichannel fusion comprises the following steps:
for the four-channel retentate signal collected in a single experiment, the number of useful pulses in the four-channel retentate signal was matched by using a pulse matching algorithm based on the short-plate principle. On the basis, the invention provides a method for constructing the characteristic data set of the multi-channel fusion. And taking the four-section redundant signal as a whole, and carrying out feature extraction on a group of four useful pulses sequentially appearing in the four-section redundant signal to construct feature data.
In particular, a three-threshold pulse extraction algorithm is used to process each segment of the unwanted signal, with the useful pulse in each segment of the signal being extracted. A pulse matching algorithm based on the short-plate principle is used to process four-segment redundancy signals, the number of useful pulses in the four-segment signals and the starting time being matched. The values of eleven multi-domain features shown in table 1 are calculated on each useful pulse in turn, for forty-four feature values in total, in units of groups of sequentially occurring useful pulses in the four-segment redundancy signal. Wherein feature values of twenty-eight time domain features are put together sequentially and feature values of sixteen frequency domain features are put together sequentially. Thus, forty-four eigenvalues calculated for each useful pulse group are aligned to construct an eigenvector. Specifically, the first through seventh feature values are time domain features calculated on the useful pulse in the first segment of the redundancy signal, … …, the twenty-second through twenty-eighth feature values are time domain features calculated on the useful pulse in the fourth segment of the redundancy signal, the twenty-ninth through thirty-second feature values are frequency domain features calculated on the useful pulse in the first segment of the redundancy signal, … …, and the forty-first through forty-fourth feature values are frequency domain features calculated on the useful pulse in the fourth segment of the redundancy signal. And adding a label with a corresponding number to the feature vector by determining in which numbered closed space the current four-channel redundancy signal is generated. In this way, multiple useful pulse sets in the four-segment redundancy signal are ultimately converted into tagged data. Fig. 7 shows a design flow of a method for constructing a feature dataset for multi-channel fusion. The four-segment redundancy signal is shown to contain 12 useful pulses each, forming 12 useful pulse groups, thereby constructing 12 pieces of characteristic data. By the method, four-channel redundant signals generated by multiple experiments can be processed to obtain different characteristic data of multiple labels, and a characteristic data set is constructed.
In the method of constructing a multi-channel fused feature data set, all feature values calculated over a set of four useful pulses are aligned to construct a feature vector. In comparison with the feature data construction method used or proposed by a and B, it is actually to combine four feature vectors respectively constructed by four useful pulses in four-segment redundancy signals into a new feature vector, and the dimension of the new feature vector is four times that of the four feature vectors. In practice, the characteristic data is constructed by directly fusing the characteristic data calculated from the four useful pulses associated with the four-channel redundancy signal, and does not deeply analyze what relationship exists between the four useful pulses and the corresponding four-segment redundancy signal, thereby quantifying the correlation. In this way, the task of finding such correlations is ultimately handed over to a classifier trained on the feature dataset that attempts to maximally exploit the correlation between feature data constructed from four-way retentate signals based on numerical distribution. From a certain point of view, the correlation among four-channel redundant signals is reserved to a certain extent as completely as possible by the characteristic data construction method. It can be seen as a piece of original forest that preserves the original rich resources, i.e. the complete relevance. Processing on the basis of the method can obtain a new characteristic data set, namely, mining the most needed resources from the original forest. However, from another perspective, this feature data construction method increases the dimension of a single feature vector. In the prediction stage of the data to be detected in the subsequent classifier training, larger time loss is brought. Therefore, it is desirable to further quantify the correlation between the four-way retentate signals and reduce the dimension of the feature vector as much as possible.
D. Method for constructing characteristic data set of multi-channel characteristic code
Based on the principle of time difference positioning method, the invention further analyzes. The method is based on time difference of acoustic emission signals reaching different sensors and space layout of the sensors, and the space position of an acoustic emission source is obtained by solving a way of listing a geometric equation set. The core of the sensor is the time difference of the acoustic emission signals reaching different sensors and the spatial layout of the sensors. Further, the premise of acquiring the time difference between the arrival of the acoustic emission signal at the different sensors is to acquire the time of arrival of the acoustic emission signal at the different sensors or the order of arrival at the different sensors. Thus, the core of the time-difference localization method can ultimately be generalized to the time or order of arrival of the acoustic emission signals at the different sensors, as well as the spatial layout of the sensors. Therefore, the correlation between the four-channel redundancy signals is further quantified by combining the two-point core contents. Meanwhile, in the multi-channel fusion characteristic data set construction method, forty-four characteristic values calculated on a group of four useful pulses sequentially appearing in the four-channel redundancy signal are combined together in a spatial sense, so that a larger dimension problem is caused. Considering the two-point core content based on the time difference positioning method, the forty-four feature values are combined together in a mathematical sense, so that the correlation between four-channel redundant signals is quantized, and the dimension of a single feature vector is reduced.
In the computer field, coding refers to coding characters, values, signals or other objects into numbers by a predetermined method. The method has the advantages of simpler codes, stronger logic, large information capacity and the like. It has been found that this is suitable to be used to process forty-four feature values calculated over a set of four useful pulses occurring sequentially in a four-channel retentate signal. Therefore, the invention further provides a characteristic data set construction method of the multi-channel characteristic code. As shown in fig. 2, the spatial layout of four sensors used in the present invention has been determined, and one point of the above two-point core content has been utilized. Considering four sensors as four-bit codes, the sensor numbered 1 is the first bit code, the sensor numbered 2 is the second bit code, the sensor numbered 3 is the third bit code, and the sensor numbered 4 is the fourth bit code. The value of each bit code comprises 1, 2, 3 and 4. The time, i.e. the order, of arrival of the redundancy signal generated by a single experiment at the four sensors is not the same. The closer to the retentate the sensor, the earlier, i.e. less sequential, the time to capture the retentate signal.
And step six, positioning the redundant objects, namely alternately placing redundant object samples into different closed spaces to generate redundant object signals. For a fixed sensor spatial layout, the order in which the unwanted signals generated in the different enclosed spaces reach the sensors is constantly changing. Thus, the present invention uses four-bit encoding to represent this change in order. For example, "2 4 1 3" represents that the sensor numbered 2 is furthest from the retentate, the sensor numbered 3 is closest to the retentate, the sensor numbered 1 is closest to the retentate, and the sensor numbered 4 is third closest to the retentate. For another example, (a) in fig. 8 shows that the four-bit coding order is 1, 2, 3, 4, and (b) in fig. 8 shows that the four-bit coding order is 4, 3, 1, 2. It was found that in this way, the four-bit code can be used perfectly to indicate the order of arrival of the unwanted signals at the different sensors in a single experiment. At this time, another point of the above two-point core content has also been utilized. Next, forty-four feature values are mathematically combined together. In this embodiment, the feature values of eleven multi-domain features calculated by a set of four useful pulses are arranged in correspondence, and the average value of the four feature values of each multi-domain feature is obtained as a new feature value of the multi-domain feature according to the serial numbers of the 11 multi-domain features shown in table 1. For example, the mean of the first feature values among the eleven multi-domain feature values calculated for the first, second, third, and fourth useful pulses is taken as the new feature value for "time delay". Thus, the combination from forty-four feature values to eleven feature values is completed. This greatly reduces the dimension of the feature vector. Based on these analyses, a specific characteristic data construction process is given as follows:
For four-channel redundancy signals acquired in a single experiment, a three-threshold pulse extraction algorithm is used to process each segment of redundancy signal, and the useful pulse in each segment of signal is extracted. At the same time, the near-far order of each segment of the redundancy signal from the redundancy is obtained. A pulse matching algorithm based on the short-plate principle is used to process four-segment redundancy signals, the number of useful pulses in the four-segment signals and the starting time being matched. The values of the eleven multi-domain features shown in table 1 are calculated sequentially on each set of four useful pulses in units of a set of four useful pulses occurring sequentially in the four-segment redundancy signal, for forty-four feature values in total. And respectively solving the average value of four feature values of each multi-domain feature to serve as a new feature value of the multi-domain feature, wherein the total number of the new feature values is eleven. Four-bit codes are added after the eleven feature values, and the eleven feature values are regarded as four feature values. And obtaining a specific numerical value of the four-bit code, namely a four-bit characteristic numerical value according to the near-far sequence of the currently collected four-channel redundancy signal from the redundancy and the number of a sensor for collecting the redundancy signal. Thus, a fifteen-dimensional feature vector is constructed. And adding a label with a corresponding number to the feature vector by determining in which numbered closed space the current four-channel redundancy signal is generated. In this way, multiple useful pulse sets in the four-segment redundancy signal are ultimately converted into tagged data. Fig. 8 shows a schematic diagram of the construction of a characteristic dataset of a multi-channel characteristic code, where (a) and (b) in fig. 8 are respectively constructed corresponding to characteristic datasets of two coding sequences, and four redundancy signals each contain 12 useful pulses to form 12 useful pulse groups, thereby constructing 12 pieces of characteristic data. By the method, four-channel redundant signals generated by multiple experiments can be processed to obtain different characteristic data of multiple labels, and a characteristic data set is constructed. In contrast to the "multi-channel fused feature dataset construction method", the dimensions of the feature dataset constructed by the method presented herein are greatly reduced, which results in less time loss. Meanwhile, forty-four feature values calculated on useful pulse groups sequentially appearing in the four-channel redundancy signal are not ignored, and the mean value of the feature values is also a way for reflecting the four-channel characteristics of the redundancy signal. It does not waste one feature value but uses all feature values. More importantly, based on the core of the time difference positioning method, the method disclosed herein skillfully uses four-bit coding to accurately and completely quantify the correlation between four-channel redundant signals. The method converts the mode of the set of the listed equations used in the time difference positioning method into setting four-bit codes, and the classifier is used for completing subsequent solving. In addition, in contrast to the "channel-labeled feature data set construction method", the method proposed herein does not cut the link between the four-channel redundancy signals, and constructs the feature data as a combination. Although the "channel labeled feature dataset construction method" has been primarily used to attempt to describe the order in which the retentate signals reach the different sensors, it appears to be the prototype of the method. But this is incorrect because the "channel-labeled feature dataset construction method" only considers the order in which the unwanted signals arrive at the different sensors, and does not consider the spatial layout of the sensors. It also considers only a single channel, rather than four channels, in describing the order in which the retentate signal reaches the different sensors. In contrast, the method presented herein comprehensively considers the order in which the four-channel retentate signal arrives at the different sensors, as well as the spatial layout of the sensors. Compared with the two feature data set construction methods proposed before, the feature data set construction method for channel characteristic coding has obvious advantages.
In other embodiments, without considering limitation of data dimension, the forty-four feature values can be processed in a mode of splicing time domain features and frequency domain features according to a feature combination mode of a multi-channel fusion feature data set construction method, and then four-bit feature values representing a receiving sequence are added after the forty-four features, so that a plurality of useful pulse groups in four-segment redundant signals are finally converted into tagged data; by the method, four-channel redundant signals generated by multiple experiments can be processed to obtain different characteristic data of multiple labels, and a characteristic data set is constructed.
The invention is mainly characterized in that:
(1) The existing multi-channel characteristic data construction research is concentrated on deep learning and depends on self-adaptive acquisition of a characteristic extractor, but the invention focuses on a construction method of a characteristic data set in a multi-channel signal acquisition scene for the first time.
(2) The correlation between the multi-channel (four-channel in the embodiment) redundancy signals is analyzed for the first time, based on the characteristics of the multi-channel, when the characteristic data set is constructed, a group of multi-channel redundancy signals are used as units for characteristic calculation and label setting, and characteristic values calculated in the multi-channel are combined together to construct the characteristic data.
(3) A pulse matching algorithm based on the short-plate principle is newly proposed. And taking the redundant signal received by the sensor farthest from the redundant as a reference, discarding redundant signal components in other three segments of redundant signals, matching the number and the starting time of useful pulses in the multi-channel (four-channel) redundant signal, and laying a foundation for constructing the characteristic data set.
(4) A method for constructing a characteristic data set of channel labeling and a method for constructing a characteristic data set of multi-channel fusion are newly provided. The two feature data set construction methods start to try to add feature data quantifying the multi-channel (four-channel) characteristics of the retentate signal based on the conventional feature data set construction method.
(5) A method for constructing the characteristic data set of multi-channel characteristic code is newly proposed. The coding concept is introduced into the study of the construction of the feature dataset for the first time. It can be used to quantify the correlation between multi-channel redundancy signals, give a specific way of combining multi-channel feature data, and reduce computational loss while guaranteeing the quality of the feature data set.
(6) The method provided by the invention can ensure that the feature data set contains complete and clear redundant positioning information, wherein the mode of solving the mean value of four feature values of each multi-domain feature to construct the feature can also ensure that the dimension of the feature data is reduced. The method effectively ensures the advantages of the characteristic data set in quality and volume, and further ensures the superiority and high efficiency of the classification performance of the trained classifier. Provides important reference for signal processing and characteristic engineering research in the similar field.
Examples
1. Preparation work
A space device model is manufactured by taking a certain type of space-borne electronic single machine as space device to be measured. According to the internal structure of the space device to be tested, three mutually orthogonal thin plates are used to divide the interior of the space device model into eight equal closed spaces and the eight equal closed spaces are numbered, as shown in fig. 9. The spatial layout of the sensor corresponds to fig. 2. This spatial layout is not shown in fig. 9 for different content representation considerations.
On the basis, the redundant samples are placed into eight closed spaces of the aerospace equipment model in sequence. In each closed space, multiple experiments are carried out, multiple groups of four-channel redundancy signals are collected, and a large number of four-channel redundancy signals representing redundancy in different closed spaces are obtained and stored.
2. Feature dataset construction
And respectively processing the collected multiple groups of four-channel redundancy signals by using a three-threshold pulse extraction algorithm to obtain a large number of useful pulses representing that the redundancy is positioned in different closed spaces. On the basis, two feature data sets are constructed based on a traditional feature data set construction method and a channel labeling feature data set construction method, and the two feature data sets are respectively named as a traditional positioning data set and a labeling positioning data set. And then, respectively processing the collected multiple groups of four-channel redundancy signals by using a pulse matching algorithm based on a short plate principle to obtain a large number of useful pulse groups representing that the redundancy is positioned in different closed spaces. On the basis, two characteristic data sets are constructed based on a characteristic data set construction method of multi-channel fusion and a characteristic data set construction method of multi-channel characteristic coding, and the two characteristic data sets are named as a fusion positioning data set and a coding positioning data set respectively. A specific description of the four positioning data sets is given in table 2.
Table 2 detailed description of the positioning dataset
It can be seen from the table that the conventional positioning dataset and the labeling positioning dataset contain an equal amount of characteristic data, as are the fusion positioning dataset and the encoding positioning dataset. And the number of characteristic data of the first two data sets is approximately four times the number of characteristic data of the second two data sets. This corresponds to the design process of the feature dataset construction method. The number of the characteristic data of each tag in each positioning data set is approximately equal, the number of the characteristic data of each tag in the traditional positioning data set and the labeling positioning data set is about forty thousands, and the number of the fusion positioning data set and the encoding positioning data set is about ten thousands. This effectively avoids interference caused by data imbalance problems.
In earlier studies, k-nearest neighbors, naive bayes, support vector machines, single decision trees, boosted decision trees, random forests, BP neural networks and recurrent neural networks were trained on traditional positioning data sets, and random forests were compared to appear as the best one, which was determined as the end-use positioning model. Considering that the aim of developing the invention is to improve the reliability of the existing redundancy positioning method, the invention decides to train the same classifier and random forest. On the one hand, the quality of the feature data set needs to have a specific quantitative index. In machine learning, the classification accuracy obtained by the classifier is an evaluation index which most directly reflects the quality of the feature data set. In the controlled variable method, one variable is changed, and other variables should be kept unchanged to reliably give the result caused by the change of the variable. Therefore, in the present invention, when the feature data sets are different, the same classifier should be employed, and the classification accuracy obtained by it should be used as an evaluation index. On the other hand, earlier studies have shown that random forests are a localization model that has been put into practical use at present. Therefore, the method is directly improved on the basis of the prior art, not only can the superiority of the method for constructing the feature data set provided by the invention be contrasted and displayed, but also the latest obtained results can be rapidly deployed into the prior equipment, and the iterative upgrading of the positioning model is realized.
The four positioning data sets shown in table 2 are respectively divided into a training set and a test set according to a ratio of 7:3. The training set is used for training the random forest, and the test set is used for evaluating the classification performance of the random forest. The invention trains random forests under default parameter configurations on four training sets respectively, the default parameter configurations are shown in table 3, and classification accuracy obtained on corresponding test sets is 51.95%, 58.58%, 83.97% and 84.72% respectively. Fig. 10 shows the effects of a random forest trained on four training sets, respectively, on a corresponding test set, precision, recall and F1-score, and (a) - (d) in fig. 10 are respectively a conventional positioning dataset, a labeling positioning dataset, a fusion positioning dataset, and a coding positioning dataset. It should be noted that for convenience of the following description, the present invention is directly described as Precision, recall and F1-score obtained from random forests trained on four positioning data sets, respectively, which highly summarises the work done on the training set and the test set.
TABLE 3 default parameter configuration for random forest
As can be seen from table 3 and fig. 10, the classification accuracy, average Precision, recall and F1-score obtained for the random forest trained on the fused and coded positioning data sets is higher than that obtained for the random forest trained on the conventional and labeled positioning data sets, and the advantages are extremely evident. This effectively indicates that the quality of the feature data set constructed based on the correlation between the four-way retentate signals is better. Wherein the classification accuracy achieved by the random forest trained on the coded localization data set is averaged Precision, recall and F1-score, preferably. Meanwhile, as is apparent from fig. 10 (a) and 10 (b), both Precision, recall and F1-score of the characteristic data of the tags 4 and 6 are smaller than those of the other tags. As can be seen with reference to table 2, the number of feature data for tags 4 and 6 in the conventional positioning dataset and the labeling positioning dataset is significantly smaller than the number of feature data for the other tags. This directly reflects the effect of imbalance data on classifier performance, although the degree of imbalance is relatively small. However, in fig. 10 (d) of fig. 10 (c), it was found that the characteristic data Precision, recall and F1-score of the characteristic data of the tags 4 and 6 are slightly different from those of the other tags. This suggests that the feature dataset constructed based on correlations between four-channel retentate signals contains more information, and that the classification performance of the random forest trained on them is more balanced and stable. In particular, in fig. 10 (d), precision, recall and F1-score of the feature data of tags 4 and 6 almost approximate or even exceed the feature data of other tags. The method has the advantages that the information which is contained in the coding positioning data set and can embody the difference between the tag data is more complete and clear, and the feasibility and the superiority of the multi-channel characteristic coding characteristic data set construction method are verified from the side. It is worth noting that the classification accuracy, average Precision, recall and F1-score, obtained for random forests trained on labeled localization datasets is significantly higher than for random forests trained on conventional localization datasets. This can also be laterally indicated as to the importance of the correlation between the multichannel unwanted signals, which, although not fully addressed by existing schemes, has a very important role.
At the same time, the time loss of the random forest trained on the four positioning data sets was counted as shown in table 4. It should be noted that, random forest training was performed on the Siteng cloud platform provided by Yi Hui information technologies, inc. The invention uses three GPU graphics cards, the specific model of which is NVIDIA Tesla V100 SXM 2GB (CPU: 9 cores, memory: 60 GB), and the specific model of which is Intel Xeon E5-2698V 4 (system disk memory: 60GB, cache disk memory: 120 GB).
Table 4 time loss of random forest trained on four positioning data sets
It can be seen from the table that the time loss of the random forest trained on the coded positioning dataset is minimal. The time loss of the random forest trained on the other three positioning data sets is almost three times that of the random forest trained on the encoded positioning data sets. This fully reveals the volumetric advantage of encoding the positioning dataset. The volume advantage here includes the number of feature data and the dimension of the feature data. The number of feature data encoding the positioning dataset is one quarter of that of the conventional positioning dataset and the labeling positioning dataset, and the feature data has only four and three more dimensions. The number of feature data of the encoded positioning dataset is the same as it is compared to the fused positioning dataset, but the dimension of the feature data is one third of it. Thus, the coded localization data set contains more and more obvious information that can embody the differences between the tag data on the basis of having a smaller number and smaller dimension of feature data. Again, the quality advantage of the encoded positioning data set is fully demonstrated, and the feasibility, superiority, stability, balance and high efficiency of the multi-channel characteristic encoded characteristic data set construction method are provided. In a comprehensive view, the random forest trained on the coded positioning data set achieves the highest classification precision, the smallest time loss and the optimal comprehensive performance. Furthermore, it is worth noting that the number of feature data fusing the positioning dataset is one quarter of that of the conventional positioning dataset and the labeling positioning dataset, and the feature data is four times as large in dimension as they appear to be a volumetric balance to the present invention. However, the time loss of the random forest trained on the fused localization dataset is significantly greater than that of the random forest trained on the conventional localization dataset and the labeled localization dataset, which means that the dimensions of the feature data of the feature dataset have a greater impact on its computation process than the number of feature data for the classifier.
In fact, the comparison of the quality of the four positioning data sets has ended so far, and the feasibility, superiority and high efficiency of the multi-channel characteristic coding characteristic data set construction method provided by the invention can be fully proved. On the basis, the operation of the feature optimization and parameter optimization part is carried out, and the superiority and stability of the method provided by the invention are continuously verified by the experimental results of the feature optimization and parameter optimization part.
3. Feature optimization:
the numerical distribution of the different feature data in the positioning dataset has a large variance. For example, the spectrum centroid has a value distribution in the interval of 0.9 to 1, and the rise time has a value distribution of less than 0.0001, which are different by 9000 times or more. Therefore, the normalization processing needs to be performed on the positioning data sets in order to adjust the numerical distribution of different feature data to be within a uniform scale, so as to ensure that the numerical distribution of different feature data is equally treated in the classifier training stage. The method respectively uses a min-max standardization method, a z fraction standardization method and a line normalization method to process the four positioning data sets shown in the table 2, trains a random forest with default parameter configuration on the processed positioning data sets, and calculates the classification accuracy obtained by the random forest, as shown in the table 5.
TABLE 5 classification accuracy of random forest on positioning data set processed by normalization method
It can be seen from table 5 that the classification accuracy achieved by the trained random forest is highest regardless of the normalization method processed coded positioning dataset. In a combined view, the classification accuracy achieved by the random forest trained on the processed fused and coded positioning data sets is significantly higher than the random forest trained on the other two processed positioning data sets. This again illustrates the importance of considering the correlation between the multi-channel redundancy signals and highlights the advantages of the feature data set construction method of the multi-channel feature coding proposed by the present invention. And (3) comparing 51.95%, 58.58%, 83.97% and 84.72% classification precision obtained by a random forest trained by the positioning data set before the standardization processing, processing the traditional positioning data set, the labeling positioning data set and the fusion positioning data set by using a min-max standardization method, and not processing the coding positioning data set. This is probably because the values of the four feature data corresponding to the four-bit code in the code positioning feature data set can only be 1, 2, 3, 4, and are integers. After the above three normalization methods are processed, they are likely to be processed into decimal numbers within a certain interval. The classification accuracy obtained by the random forest trained by the conventional positioning data set before the normalization process is specifically 51.9462%, and the classification accuracy obtained by the conventional positioning data set after the rounding process is about 51.95%. The classification accuracy obtained by the random forest trained by the traditional positioning data set after the min-max standardization treatment is 51.9543 percent, and the classification accuracy obtained by the random forest trained by the traditional positioning data set after the min-max standardization treatment is 51.95 percent after rounding. The latter is somewhat higher than the former and the invention therefore decides to use the min-max normalization method for processing the conventional positioning dataset.
In the invention, a characteristic selection method based on multi-index combination sorting is used for processing a traditional positioning data set, a labeling positioning data set and a coding positioning data set, and a characteristic selection method of a multi-channel weighting threshold value is used for processing a fusion positioning data set. Figure 11 shows the classification accuracy achieved by a random forest trained on a set of positioning data that retains different dimensional feature data or sets of feature data. It should be noted that, in the conventional positioning data set, the labeling positioning data set and the encoding positioning data set, each characteristic data is relatively independent. In the fused positioning data set, every adjacent four feature data correspond to the same multi-domain feature, and high correlation exists between the four feature data.
As can be seen from fig. 11, the accuracy of the trained random forest acquisition gradually decreases as the feature data or feature data set dimensions retained in the positioning dataset decrease. This illustrates that none of the eleven multi-domain features shown in Table 1 that are extracted are significant and should be discarded. They all can well quantify the difference of signal characteristics between the redundant signals generated by the redundant materials at different positions. Specifically, as the dimensions of the retained feature data or feature data set decrease, the attenuation amplitude of the positioning accuracy achieved by the random forest trained on the fused positioning data set and the coded positioning data set is significantly lower than that achieved by the random forest trained on the conventional positioning data set and the labeled positioning data set. And the attenuation trend of the two is more stable. Meanwhile, no matter how many dimensions of feature data or feature data sets are reserved, the positioning accuracy obtained by the random forest trained on the labeling positioning data set, the fusion positioning data set and the coding positioning data set is larger than that obtained by the random forest trained on the traditional positioning data set. These again demonstrate the importance of considering the correlation between the four-way retentate signals. It is noted that in the initial phase, when the dimensions of the retained feature data or feature data set are the same, the positioning accuracy achieved by the random forest trained on the fused positioning data set appears to be higher than the one achieved by the random forest trained on the encoded positioning data set. This is incorrect, as previously described, in the fused localization dataset, feature selection is performed in units of feature data sets. Thus, the 11, 10, etc. feature data sets shown in fig. 11 actually correspond to 44, 40 feature data. Thus, the classification accuracy obtained by the random forest trained on the fused positioning data set is slightly higher than that obtained by the random forest trained on the coded positioning data set on the basis of maintaining the feature data dimension four times that of the coded positioning data set. And, when the dimension of the retained feature data set is less than 7, the classification accuracy achieved by the random forest trained on the fused localization data set begins to be less than the classification accuracy achieved by the random forest trained on the encoded localization data set. Globally, random forests trained on the encoded localization data set achieve minimal attenuation of localization accuracy and tend to be more stable. These are all sufficient to demonstrate the high quality of the encoded localization data set and the superiority of the multi-channel feature encoded feature data set construction method proposed by the present invention.
In summary, only the min-max normalization method is used to process the traditional positioning data set, the labeling positioning data set and the fusion positioning data set, and the feature selection is not performed on the traditional positioning data set, the labeling positioning data set and the fusion positioning data set, and meanwhile, no processing is performed on the coding positioning data set. In this way, a new conventional positioning dataset, a labeling positioning dataset, and a fusion positioning dataset are obtained, as well as the original coded positioning dataset. The classification accuracy achieved by the random forests they train was 51.95%, 58.65%, 84.12% and 84.72%, respectively.
4. Parameter optimization:
in machine learning, a classifier with a default parameter configuration that is trained can often achieve better classification performance on a generic feature dataset. For a characteristic data set in a special structure or a special field, parameters of a classifier need to be optimized to further improve the classification performance of the classifier. Five parameters of the random forest shown in table 3 are optimized by adopting a grid search method, and respective optimal values are obtained. The author sets the optimized range of values and step sizes for the five parameters according to the attributes of the four positioning datasets as shown in table 6. It should be noted that, for criterion, the set step size is one of mse, mae, gini and entopy at a time. The setting of the random forest trained on the different positioning data sets is also different for max_features. Specifically, the optimal value range of the random forest trained on the traditional positioning data set is 1-11, and the step length is 1. The optimized value range of the random forest trained on the labeling positioning data set is 1-12, and the step length is 1. The optimized value range of the random forest trained on the fusion positioning data set is 1-44, and the step length is 4. The optimized value range of the random forest trained on the coding positioning data set is 1-15, and the step length is 1.
TABLE 6 parameter optimization configuration of random forest
The grid search method can enable the classification performance of the classifier to achieve global optimum, but has the problem of high time loss. In order to accelerate parameter optimization, parameters important to random forests are determined first, and five parameters shown in table 6 are optimized in a first-area and later-local search mode. According to the attribute of the positioning data set, the n_optimizers are optimized first, and other parameters keep default configuration. Fig. 12 shows classification accuracy achieved by random forests trained on four positioning data sets when n_evasions take different values, and (a) - (d) in fig. 12 correspond to the results of the conventional positioning data set, the labeled positioning data set, the fused positioning data set, and the coded positioning data set.
As shown in fig. 12, as the value of n_detectors increases, the classification accuracy achieved by random forests trained on four positioning data sets increases gradually. When the value of n_evators is in the interval of 80 to 100, the classification accuracy obtained by the random forest trained on the four positioning data sets is highest, and the saturated state is reached. After this, the classification accuracy achieved by random forests is in a tendency to oscillate unchanged or to fade gradually. Specifically, when n_estimators take 100, 90 and 90 respectively, the classification accuracy obtained from random forests trained on the conventional positioning dataset, the labeling positioning dataset, the fusion positioning dataset and the encoding positioning dataset is highest. On this basis, the other four parameters of the four random forests are optimized to obtain the optimal values of the parameters, as shown in table 7.
TABLE 7 parameter optimization results for random forest
Table 8 gives the classification accuracy achieved by four random forests before and after parameter optimization. The table shows that the classification accuracy obtained by the four random forests after parameter optimization is greatly improved. Through statistics, it can be obtained that the highest classification precision of the random forest trained on the coded positioning data set is 94.03% after parameter optimization. This is also the highest classification accuracy achieved by the positioning model currently trained in the field of positioning of redundancy. Again, this demonstrates the superiority of the feature dataset construction method of multi-channel feature encoding. Compared with 84.08 percent obtained in Technology of Locating Loose Particles Inside Sealed Electronic Equipment Based on Parameter-Optimized Random Forest of Sun Zhigang et al, the method is improved by 9.95 percent. The classification performance of the existing positioning model is remarkably improved, and the reliability of the positioning model in the application process is improved to a certain extent. Overall, the random forest trained on the fusion positioning data set and the coding positioning data set still has higher classification precision and obvious advantages after parameter optimization. This fully illustrates the importance of constructing a feature dataset based on correlations between four-way retentate signals.
Table 8 classification accuracy for random forest acquisition before and after parameter optimization
Likewise, the time loss in parameter optimization for random forests trained on the four positioning data sets was counted, as shown in table 9. It should be noted that, the parameter optimization work is also performed on the Siteng cloud platform, and the hardware setting of the platform is the same as that in section 2.
Table 9 time loss of four random forests during parameter optimization
It can be seen from the table that the random forest, which is still trained on the coded localization data set, has minimal time loss in the parameter optimization process. The random forest trained on the other three positioning data sets is almost twice as time consuming in the optimization process as the random forest trained on the encoded positioning data sets. Again, this fully reveals the volume advantage of encoding the positioning dataset. Furthermore, the time loss in the parameter optimization process for the random forest trained on the fused localization dataset is still greater than for the random forest trained on the conventional localization dataset and the labeled localization dataset, proving that: for a classifier, the dimensions of the feature data set have a greater impact on its computation process than the number of feature data.
The classification accuracy obtained by the random forest trained on the coded positioning data set is highest in the construction stage of the original characteristic data set, the characteristic optimization stage and the parameter optimization stage, and the stability and the balance of the classification performance are optimal. More importantly, because of its bulk advantage, its time loss is also minimal. These are all sufficient evidence of the high quality of the encoded localization data set, and the feasibility, superiority, stability, equalization and high efficiency of the multi-channel feature encoded feature data set construction method proposed by the present invention.
5. And (3) generalization performance verification:
by testing the generalization performance of the random forest trained by the test. Specifically, multiple experiments are conducted again by using the manufactured aerospace equipment model, and a large number of four-channel redundancy signals representing redundancy in different closed spaces are obtained. Likewise, multiple sets of four-channel retentate signals are processed separately using a three-threshold pulse extraction algorithm to convert them into useful pulses. And thirdly, respectively processing a plurality of groups of four-channel redundant signals by using a pulse matching algorithm based on a short plate principle to obtain a plurality of useful pulse groups. On the basis, the characteristic data set constructing method based on the multi-channel characteristic coding constructs a coding positioning data set for verification, and the coding positioning data set is named as a coding verification data set. A specific description of the encoded validation data set is given in table 10.
Table 10 specific description of encoded validation data set
Ten-fold cross-validation is performed on the code validation dataset using the parameter-optimized random forest obtained in section 4, resulting in the classification accuracy that it achieves on the training set and the test set, respectively, as shown in fig. 13. It can be seen that in the ten-fold cross validation process, the classification accuracy of the random forest obtained on the training set and the test set is similar, and no obvious gap exists, which effectively proves the stability of the random forest. Through calculation, the average classification precision of the random forest on the training set and the test set is 93.37% and 91.70%, respectively. The drop is smaller than 94.03% obtained in section 4, and is within the allowable range. The method effectively illustrates the strong generalization performance of the random forest, and verifies the stability and the strong generalization performance of the multi-channel characteristic coding feature data set construction method provided by the invention. Notably, the average classification accuracy achieved by random forests on the test set is slightly lower than that achieved by training techniques because during cross-validation, nine-component data of the validation dataset is used to construct the training set and only one component data is used to construct the test set. They contain eight times the number of feature data.
6. Application process analysis
Based on the previous seven steps of the redundant positioning, the random forest with optimized parameters, namely the optimal positioning model, is obtained, and can be used for positioning test of the aerospace equipment to be tested. The optimal positioning model trained by the invention is applied to complete the redundant positioning test, thereby giving a specific application process.
And eighth, referring to the positioning of the redundant substances, fixing the to-be-detected aerospace equipment to a hardware platform of a PIND method, driving a vibration table to apply mechanical excitation to the to-be-detected aerospace equipment, and activating the redundant substances at unknown positions in the to-be-detected aerospace equipment to be in a random motion state to generate a redundant substance signal. The four sensors capture the redundancy signal, which is saved as a set of four-way redundancy signals. The current four-channel retentate signals are processed separately using a three-threshold pulse extraction algorithm to convert them into useful pulses. And thirdly, respectively processing the current four-channel redundancy signals by using a pulse matching algorithm based on a short plate principle to obtain a plurality of useful pulse groups. On the basis, a characteristic data set construction method based on multi-channel characteristic coding constructs a plurality of pieces of characteristic data to be detected. It should be noted that, since the position of the redundancy is unknown at this time, the tag cannot be added to the feature data, and thus, there is no need to construct a feature data set, and a plurality of pieces of feature data to be measured are directly regarded as a set, which is called a set to be measured. In practice, there is also no need to tag these feature data, as their tags are predicted by the positioning model.
Specifically, for a satellite-borne electronic single machine of a certain model, a set to be tested is obtained through the processing of the steps, wherein the set to be tested contains 77 pieces of characteristic data. And predicting the labels of the labels by applying an optimal positioning model. The result shows that the tag having 71 pieces of feature data is predicted as "2", the tag having 3 pieces of feature data is predicted as "1", the tag having 2 pieces of feature data is predicted as "4", the tag having 1 piece of feature data is predicted as "3", and the tag having 1 piece of feature data is predicted as "6". The common label of the 77 pieces of characteristic data is "2" through the majority voting process. Therefore, the positioning test result shows that the redundant objects are positioned in the closed space with the internal number of 2 of the satellite-borne electronic single machine. Subsequently, the detection personnel perform redundant object cleaning work in the corresponding space inside the satellite-borne electronic single machine according to the closed space division rule of the aerospace equipment model.
And analyzing the application process. If the redundancy is actually located in the closed space with the internal number of 2 of the satellite-borne electronic single machine, according to the specific implementation steps of section 1, the labels of the characteristic data constructed by the current redundancy signal are considered to be all '2', but the labels of only 71 pieces of characteristic data in 77 pieces of characteristic data are considered to be '2', which illustrates that the limitation of the classification performance of the optimal positioning model, namely the classifier, cannot reach 100%. On the other hand, the relative superiority and stability of the optimal positioning model obtained by the invention are illustrated. It still achieves a classification accuracy of 71 ≡77=92.21% and does not differ much from the highest classification accuracy of 94.03%. In fact, for the positioning model, the redundant positioning test result can be relatively accurately given as long as the classification accuracy of more than 50% can be ensured to be stably obtained. Specifically, for a plurality of pieces of feature data to be detected, if the positioning model can stably obtain more than 50% of classification accuracy, the labels of more than five pieces of feature data can be predicted correctly, and the correct labels are unique. Thus, in subsequent majority votes, the feature data occupying more than five votes will determine the correct common label. Therefore, if we consider the classification accuracy of 50% obtained by the positioning model as a threshold, we aim to continuously improve the classification accuracy obtained by the positioning model, namely, hope to improve the margin space of the distance threshold, ensure that more than half of correct labels can be stably predicted when the positioning model encounters characteristic data with extreme quality, and give correct common labels.
Claims (9)
1. The method for constructing the redundant positioning characteristic data set of the multi-channel characteristic code is characterized by comprising the following steps of:
s1, aiming at a space flight equipment model for detecting particle collision noise, acquiring redundant signals by using N sensors arranged on the space flight equipment model, and regarding N sensors serving as N channels as N encoders to obtain redundant signals corresponding to the N channels respectively;
s2, extracting useful pulses from redundant signals corresponding to N encoders respectively, and completing pulse matching based on the number of the useful pulses of the N encoders and the starting time; processing N channel redundancy signals acquired in a single mode respectively during useful pulse extraction to acquire peak time of first useful pulse in the N channel redundancy signals; the method comprises the steps of ascending sequence sorting N peak moments, identifying N sensors from near to far in N channel surplus signals to collect corresponding surplus signals, and taking the value of the sorting sequence as a characteristic value, namely the characteristic value used for representing the near-far sequence of the N channel surplus signals collected before from the surplus, so as to obtain N sequence characteristic values corresponding to N channels;
For the useful pulses subjected to pulse matching, calculating the numerical value of a multi-domain feature on each group of N useful pulses in sequence by taking a group of N useful pulses sequentially appearing in the N matched redundant signals as a unit, wherein the multi-domain feature comprises a time domain feature and a frequency domain feature; setting that the multi-domain feature corresponding to each section of signal of the redundancy signal of each channel comprises M multi-domain features, and marking each multi-domain feature in the M multi-domain features as 1 signal feature value;
constructing a multidimensional feature vector based on the M signal feature values and the N sequence feature values;
adding labels with corresponding numbers to the feature vectors by determining in which numbered enclosed space the current N channel redundant signals are generated;
s3, processing N channel redundant signals generated by multiple experiments based on the steps S1 and S2 to obtain different characteristic data of multiple labels, and constructing a characteristic data set.
2. The method for constructing a multi-channel property-coded redundancy positioning feature data set according to claim 1, wherein the space inside the space model of the aerospace device for detecting the collision noise of particles in S1 is divided into a plurality of closed spaces, each of the closed spaces is numbered, and when the data set is constructed, redundancy samples are respectively arranged in different closed spaces and the collision noise of particles is detected, thereby obtaining redundancy signals.
3. The method for constructing the multi-channel feature-coded redundancy positioning feature data set according to claim 2, wherein in order to acquire signals, when N sensors are arranged on the aerospace device model, the centroid of the aerospace device model is taken as a reference, the N sensors are arranged at positions as far as possible from the centroid, and the distance between the sensors and the centroid is ensured to be within the sensitive detection range of the sensors.
4. The method of claim 2, wherein the time domain features include time delay, pulse rise time, pulse symmetry, pulse amplitude, pulse energy, root mean square voltage, and zero crossing rate.
5. The method for constructing a multi-channel feature encoded redundancy positioning feature data set according to claim 4, wherein the frequency domain features include spectrum centroid, spectrum mean square error, root mean square probability, and frequency standard deviation.
6. The method for constructing a multi-channel property encoded redundancy positioning feature data set according to any one of claims 1 to 5, wherein the step S2 is performed by using a three-threshold pulse extraction algorithm.
7. The method of claim 6, wherein the step of performing pulse matching based on the number of useful pulses of the N encoders and the start time comprises the steps of:
s201: in the process of acquiring the peak time of the first useful pulse in the N channel redundancy signals and carrying out ascending order on the N peak time, the peak time of the first useful pulse in the N redundancy signals corresponding to the sequence from the near to the far from the redundancy is respectively expressed as T 1 、T 2 、T 3 、……、T N ;
S202: calculate T N -T 1 、T N -T 2 、……、T N -T N-1 Representing the time difference between arrival of the unwanted signal at the nearest sensor and arrival at the farthest sensor, respectively;
s203: the supplementing time is respectively before the starting time of the redundant signal acquired by the farthest sensor and the nearest, next nearest and third nearest to the redundant signal is T N -T 1 、T N -T 2 、……、T N -T N-1 Is a zero pulse of (2);
s204: and aligning the starting moments of the N new redundant signals, taking the length of the redundant signal acquired by the sensor farthest from the redundant as a reference, respectively intercepting the signal components with the same length from the starting moment in other N-1 redundant signals, reserving, and discarding the redundant signal components.
8. The method of claim 7, wherein the step of constructing the multi-dimensional feature vector based on the M signal feature values and the N sequential feature values comprises the steps of:
for each signal feature M in the M signal features, calculating a feature mean value of the feature M corresponding to the matched group of N useful pulses, wherein the feature mean value is used as a signal new feature value of the signal feature M, and a basic feature vector formed by the M signal new feature values;
adding N bits of codes behind the basic feature vector, and further constructing an M+N-dimensional feature vector, namely the obtained multidimensional feature vector;
the N-bit codes are N sequential characteristic values, and each 1 characteristic value corresponds to a characteristic value used for representing the near-far sequence of the N channel redundant signals acquired before from the redundant in the pulse matching process.
9. The method of claim 7, wherein the step of constructing the multi-dimensional feature vector based on the M signal feature values and the N sequential feature values comprises the steps of:
for the useful pulse subjected to pulse matching, setting M1 time domain features and M2 frequency domain features in M signal features, sequentially putting together characteristic values of N times M1 time domain features and sequentially putting together characteristic values of N times M2 frequency domain features for M signal features corresponding to each redundant signal in N redundant signals; in this way, the n×m feature values calculated by the useful pulse group corresponding to each group in the N redundant signals are arranged in a row, and a feature vector is constructed as a basic feature vector;
Adding N bits of codes behind the basic feature vector, and further constructing an N-dimension feature vector of N+M, namely the obtained multidimensional feature vector;
the N-bit codes are N sequential characteristic values, and each 1 characteristic value corresponds to a characteristic value used for representing the near-far sequence of the N channel redundant signals acquired before from the redundant in the pulse matching process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311186227.5A CN117216565B (en) | 2023-09-14 | 2023-09-14 | Multi-channel characteristic coding redundant positioning characteristic data set construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311186227.5A CN117216565B (en) | 2023-09-14 | 2023-09-14 | Multi-channel characteristic coding redundant positioning characteristic data set construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117216565A true CN117216565A (en) | 2023-12-12 |
CN117216565B CN117216565B (en) | 2024-05-24 |
Family
ID=89038351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311186227.5A Active CN117216565B (en) | 2023-09-14 | 2023-09-14 | Multi-channel characteristic coding redundant positioning characteristic data set construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117216565B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102590359A (en) * | 2012-02-08 | 2012-07-18 | 航天科工防御技术研究试验中心 | Method and system for identifying redundancy signals |
CN102788709A (en) * | 2012-08-09 | 2012-11-21 | 哈尔滨工业大学 | Automatic remainder detection device and method for spaceborne electronic equipment |
CN102830421A (en) * | 2012-08-09 | 2012-12-19 | 哈尔滨工业大学 | Method for identifying redundancies and assembly of satellite-borne electronic device |
CN115343676A (en) * | 2022-08-19 | 2022-11-15 | 黑龙江大学 | Feature optimization method for technology for positioning excess inside sealed electronic equipment |
CN115345071A (en) * | 2022-08-11 | 2022-11-15 | 黑龙江大学 | Method and device for positioning redundancy inside space equipment based on instance migration |
CN115685072A (en) * | 2022-09-28 | 2023-02-03 | 哈尔滨工业大学 | Method for positioning unstable acoustic emission source in sealed cavity based on multi-classification model |
CN116257777A (en) * | 2023-02-13 | 2023-06-13 | 哈尔滨工业大学 | Classification model fusion type sealed relay redundant detection and material identification method |
-
2023
- 2023-09-14 CN CN202311186227.5A patent/CN117216565B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102590359A (en) * | 2012-02-08 | 2012-07-18 | 航天科工防御技术研究试验中心 | Method and system for identifying redundancy signals |
CN102788709A (en) * | 2012-08-09 | 2012-11-21 | 哈尔滨工业大学 | Automatic remainder detection device and method for spaceborne electronic equipment |
CN102830421A (en) * | 2012-08-09 | 2012-12-19 | 哈尔滨工业大学 | Method for identifying redundancies and assembly of satellite-borne electronic device |
CN115345071A (en) * | 2022-08-11 | 2022-11-15 | 黑龙江大学 | Method and device for positioning redundancy inside space equipment based on instance migration |
CN115343676A (en) * | 2022-08-19 | 2022-11-15 | 黑龙江大学 | Feature optimization method for technology for positioning excess inside sealed electronic equipment |
CN115685072A (en) * | 2022-09-28 | 2023-02-03 | 哈尔滨工业大学 | Method for positioning unstable acoustic emission source in sealed cavity based on multi-classification model |
CN116257777A (en) * | 2023-02-13 | 2023-06-13 | 哈尔滨工业大学 | Classification model fusion type sealed relay redundant detection and material identification method |
Non-Patent Citations (5)
Title |
---|
GUOFU ZHAI; PENGFEI LI; GUOTAO WANG; ZHIGANG SUN; XIAO HAN; QIANG WANG: "Periodic Signal Recognition Technology Based on Framing Window Adaptive Scaling Algorithm and Trajectory Tracking Algorithm A Case Study of Aerospace Loose Particle Detection Signal", IEEE SENSORS JOURNAL, 5 June 2023 (2023-06-05) * |
孙志刚等: "参数优化支持向量机的密封电子设备多余物定位方法研究", 电子测量与仪器学报, 30 July 2021 (2021-07-30) * |
孙永玲, 王兰涛, 王育红, 许树芳: "战术导弹活动多余物检测技术", 航天工艺, no. 04, 28 August 2000 (2000-08-28) * |
梁晓雯;蒋爱平;王国涛;李响;薛永越;: "参数优化决策树算法的密封继电器多余物信号识别技术", 电子测量与仪器学报, no. 01, 15 January 2020 (2020-01-15) * |
翟国富;陈金豹;邢通;王世成;王淑娟;: "基于聚类分析的航天继电器多余物检测方法研究", 振动与冲击, no. 02, 28 January 2013 (2013-01-28) * |
Also Published As
Publication number | Publication date |
---|---|
CN117216565B (en) | 2024-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112200244B (en) | Intelligent detection method for anomaly of aerospace engine based on hierarchical countermeasure training | |
US11379689B2 (en) | Technology for analyzing abnormal behavior using deep learning-based system and data imaging | |
Hao et al. | Underdetermined source separation of bearing faults based on optimized intrinsic characteristic-scale decomposition and local non-negative matrix factorization | |
WO2021174901A1 (en) | Molecular omics data structure implementation method based on data independent acquisition mass spectrum | |
CN115343676B (en) | Feature optimization method for positioning technology of redundant substances in sealed electronic equipment | |
US10957523B2 (en) | 3D mass spectrometry predictive classification | |
CN116299684B (en) | Novel microseismic classification method based on bimodal neurons in artificial neural network | |
CN115685072B (en) | Method for positioning unstable acoustic emission source in sealed cavity based on multi-classification model | |
Bagniewski et al. | Automatic detection of abrupt transitions in paleoclimate records | |
CN103761965A (en) | Method for classifying musical instrument signals | |
CN113533511A (en) | Steel rail weld joint monitoring method based on deep learning network model | |
CN103310235A (en) | Steganalysis method based on parameter identification and estimation | |
CN112464721A (en) | Automatic microseism event identification method and device | |
US10585130B2 (en) | Noise spectrum analysis for electronic device | |
CN117909836A (en) | Raman spectrum identification method and system suitable for complex system | |
CN114201993B (en) | Three-branch attention feature fusion method and system for detecting ultrasonic defects | |
CN116626753A (en) | Microseism event identification method and system based on multi-modal neural network | |
CN117216565B (en) | Multi-channel characteristic coding redundant positioning characteristic data set construction method | |
Quan et al. | WVD‐GAN: A Wigner‐Ville distribution enhancement method based on generative adversarial network | |
CN116956745A (en) | Reliability analysis method for positioning and ensuring redundant objects of sealed electronic equipment | |
CN114858958B (en) | Method and device for analyzing mass spectrum data in quality evaluation and storage medium | |
CN113869289B (en) | Multi-channel ship radiation noise feature extraction method based on entropy | |
CN115171790A (en) | Method, device and storage medium for analyzing mass spectrum data sequence in quality evaluation | |
CN112687280B (en) | Biodiversity monitoring system with frequency spectrum-time space interface | |
CN114141316A (en) | Method and system for predicting biological toxicity of organic matters based on spectrogram analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |