CN115525786B

CN115525786B - Method for constructing ionospheric frequency high-graph classification sample library

Info

Publication number: CN115525786B
Application number: CN202211238852.5A
Authority: CN
Inventors: 高鹏东; 裘初; 齐全; 王铮
Original assignee: National Space Science Center of CAS; Communication University of China
Current assignee: National Space Science Center of CAS; Communication University of China
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2024-02-20
Anticipated expiration: 2042-10-11
Also published as: CN115525786A

Abstract

The invention discloses a construction method of an ionospheric frequency high-frequency chart classification sample library, which comprises the following steps: s1, concentrating all frequency high-image images corresponding to a manually judged TXT identification file in a total folder; s2, analyzing the TXT identification file to obtain the categories corresponding to all the frequency high graphs in the total folder; s3, moving the frequency high graph with the determined category into each classified folder; s4, constructing a relatively balanced sample library of each class of samples through an up-down sampling mechanism. According to the method for constructing the ionosphere frequency-high-image classification sample library, the influence of natural phenomena on the ionosphere frequency-high images is simulated by introducing different types of random noise, so that sample data of various frequency-high image expansion F are expanded, and a relatively balanced frequency-high-image sample library is finally established, so that the method is convenient for serving subsequent supervision and study.

Description

Method for constructing ionospheric frequency high-graph classification sample library

Technical Field

The invention relates to a frequency high-frequency chart processing technology, in particular to a construction method of an ionosphere frequency high-frequency chart classification sample library.

Background

Ionosphere altimeter is a remote sensing device based on radar echo detection of the ionosphere space environment, and the generated image data is called a frequency-altitude map (frequency Gao Tu) and reflects the change of electron density with altitude by a clear trace. The ionosphere F (about 130 km-1000 km, the vast majority of spacecraft flying areas) is not a stable layer, but rather there are some fine structures of plasma (density non-uniformities, or irregularities) that cause diffuse reflection of the incident wave, presenting a piece that is not a sharp line trace, but a dispersion. The expansion F is a natural phenomenon of ionosphere plasma irregularities, affecting radio wave propagation, thereby generating specific dispersion patterns in the frequency hypergraph, and different forms of which correspond to different physical laws.

The current international acceptance of the class F class is relatively high, and is the "Manual of ionization diagram interpretation and measurement" revised in 1978 by the International radio science Union. The graphic features in the frequency-height diagram are divided into a frequency type (Frequency Spread F, FSF for short), a region type (Range Spread F for short, RSF for short), a Mixed type (Mixed Spread F for short, MSF) and a manifold type (BSF for short), and the proposal is made that each station can be classified by itself in a targeted manner in view of the huge difference of the phenomenon features of the Spread F of each station around the world. Corresponding to the above 4 categories, respectively: (1) The frequency is a disturbance structure near the peak value height of the F layer, wherein the trace of the F layer in the low frequency section is clear, and the high frequency section is expanded; (2) The area is a plasma density uneven structure near the bottom of the F layer, wherein the trace of the F layer in the high frequency section is clear, and the low frequency section is expanded; (3) The hybrid type has the characteristics of both frequency type and regional type, and the mechanism is relatively complex; (4) The manifold is a plasma structure with an expansion near the F-layer peak frequency and an expansion F-bifurcation different from the F-layer trace, corresponding to a horizontal distribution and different density, and may be associated with ion sedimentation in a high latitude region.

Because of the scientificity and complexity of the frequency chart, the human eyes can only perform experience judgment in the past internationally. One major drawback of this approach in scientific research is that the subjective judgment of the person is mixed, and the standards for the judgment of the extended F category are different for different researchers, so that even for the same scientist, in the course of working for a long time, the judgment standards will change for the extended F graphic features that change with the year, season, local time, etc.

With the development of the Chinese aerospace technology, particularly the further construction of the second stage of meridian engineering of the central traction of the space center of the Chinese academy, more than ten digital altimeters which are distributed throughout China and work for 24 hours are increased in 2023, the time resolution is about 5-15 minutes, and even engineering personnel develop a high-precision detection network with 1 minute resolution. Under the situation, the traditional mode of relying on manual interpretation of the frequency-high diagram is unfavorable for real-time monitoring of space environment, and all stations cannot be identified by adopting a unified judgment standard in 24 hours manually, so that from the application point of view, the development of the ionosphere frequency-high diagram expansion F phenomenon artificial intelligent identification method is very necessary. In order to develop an intelligent interpretation model of the ionosphere frequency high-graph expansion F phenomenon, a sample library with reasonable data category distribution is necessary according to the requirement of supervised method learning.

Disclosure of Invention

The invention aims to provide a construction method of an ionosphere frequency-high graph classification sample library, which simulates the influence of natural phenomena on the ionosphere frequency-high graph by introducing different types of random noise, thereby expanding sample data of various frequency-high graph expansion F, and finally establishing a relatively balanced frequency-high graph sample library, so that the method is convenient for subsequent supervision and study services.

In order to achieve the above purpose, the invention provides a construction method of an ionospheric frequency high-frequency chart classification sample library, which comprises the following steps:

s1, concentrating all frequency high-image images corresponding to a manually judged TXT identification file in a total folder;

s2, analyzing the TXT identification file to obtain the categories corresponding to all the frequency high graphs in the total folder;

s3, moving the frequency high graph with the determined category into each classified folder;

s4, constructing a relatively balanced sample library of each class of samples through an up-down sampling mechanism.

Preferably, the step S2 specifically includes the following steps:

s21, traversing all the frequency high-image images in the total folder, and obtaining drawing time of all the frequency high-image images;

s22, reading the manually-judged TXT identification file row by row, and obtaining the frequency-high graph category information marked in the manually-judged TXT identification file;

s23, unifying drawing time of the frequency high-frequency image and manually judging frequency high-frequency image time in the TXT identification file, and determining the category of the frequency high-frequency image according to records in the TXT.

Preferably, in step S21, the drawing time format of the intermediate-frequency high-level image is year, month, day, minute, second, and the initial classification is set to 0.

Preferably, the first column of the TXT identification file in step S22 records the date when the extended F phenomenon occurs in a yearly product manner, the second column is the start time when the extended F phenomenon occurs recorded in twenty-four hour time, the third column is the deadline when the extended F phenomenon occurs recorded in twenty-four hour time, and the fourth column is the duration when the extended F occurs, wherein the first two bits of the fourth column are hours and the second two bits are minutes;

and when the time resolutions of the TXT identification files in step S22 are different, determining the category of the TXT identification files by:

s221, judging the number of images of the frequency-high image under the current folder, if the number of images is smaller than or equal to a set value, calculating according to the first time resolution, otherwise, calculating according to the second time resolution;

s222, converting the yearly-accumulated time-of-day timing method into a yearly-month-day-time-minute-second timing method according to the first column data of the TXT identification file;

s223, according to the second column data and the third column data of the TXT identification file, the starting and ending time of a certain expansion F phenomenon is obtained by time counting of time, day and time second;

s224, according to the fourth column of the TXT identification file, acquiring the duration time of the expansion F phenomenon, and then according to the judged time resolution, estimating the number of the images of the expansion F phenomenon possibly existing in the total folder;

s225, marking the category of the frequency high graph according to the TXT identification file;

s226, circulating the steps S221-S225 until all the manually determined marks are traversed.

Preferably, the set value in step S221 is 35040 =24×4×365;

the first time resolution was 15 minutes and the second time resolution was 5 minutes.

Preferably, in step S224:

the estimated image quantity calculation formula of the first resolution expansion F phenomenon is as follows: number offlag=round (floor (i)/100) 60/15+mod (i), 100)/15);

the estimated image quantity calculation formula of the second resolution expansion F phenomenon is as follows: number offlag=round (floor (i)/100) 60/5+mod (i), 100)/5;

wherein duration (i) is the duration of occurrence of the i-th line expansion F phenomenon in the TXT identification file.

Preferably, in step S225, this occurs due to file loss:

when traversing TXT identification files row by row, if a corresponding frequency high map exists at the initial position of a certain row and the corresponding frequency high map is missing at the final position, locking the marking range of the category in the number ofFlag frequency high maps behind the initial position, and marking the frequency high map as the category of the corresponding row if the time of the file name of the frequency high map is smaller than the time of the file name of the frequency high map at the final position; if the number is greater than, marking is not carried out, and the cycle is exited;

if the starting position of the row lacks the corresponding frequency high map and the ending position has the corresponding frequency high map, locking the marking range of the category in the numberofFlag frequency high maps before the ending position, and if the time division of the file name of the frequency high map is greater than the time division of the starting position, marking the frequency high map as the category of the corresponding row; if the number is less than the preset number, marking is not carried out, and the cycle is exited;

and when the starting position and the ending position are not corresponding to the frequency high diagram, ending the cycle, and directly entering the next TXT identification file for judgment.

Preferably, in step S4, the order difference of the frequency-high image of each class is considered, and the class with the number higher than the set sample size is subjected to the equalization operation by introducing different types of noise, and the frequency-high image is subjected to downsampling; and carrying out data enhancement on the frequency high graph of the class with the number lower than the set sample size.

Preferably, the data enhancement specifically includes the following steps:

firstly, copying an original frequency high graph to a sample library, and defining n noise methods;

then, generating a random number in the range of [1, n ], and randomly determining the type of noise added in the selected sample;

then, generating random numbers between [1,5], randomly determining the number of blocks, the number of rows and the number of columns for loading noise, and randomly determining the positions for adding random block, row and column noise in the image;

then, adding corresponding random noise on the determined image blocks, rows and columns;

and adding a noise mark to the image file name with added noise into a sample library of a corresponding category, and circulating the operation until the number of the frequency high pattern books of the category meets the requirement.

The number 5 is an empirical value, and the amount of noise in the existing frequency high-frequency chart is usually not more than 5 blocks, 5 rows, or 5 columns.

Preferably, the noise method at least comprises a Gaussian method and a spiced salt method;

gaussian methods include random gaussian, and column Gao Sifa;

the spiced salt method comprises a random spiced salt method, a row spiced salt method and a column spiced salt method;

in the aspect of super parameters, the noise density parameters of a Gaussian method and a spiced salt method are adjustable; the noise width of the Gaussian method, the row Gao Sifa and the line and line salt-and-pepper method is adjustable; the noise coverage area of the random Gaussian method and the random spiced salt method is adjustable.

Therefore, the invention has the following beneficial effects:

1. under the condition that original observation and incomplete artificial identification data are fully considered, analyzing the artificial identification file, namely taking account of the coexistence of sampling rates of 5 minutes and 15 minutes in sampling data of the same year, adopting a fault-tolerant mechanism for the situation that partial frequency high graphs are possibly lost, and having stronger robustness when establishing the corresponding relationship between the artificial identification and the frequency high graphs, and being capable of correctly establishing the category corresponding relationship between the artificial identification and the frequency high graphs;

2. the original altimeter frequency high-frequency chart is not destroyed and still is a part of a sample library;

3. the problem that the distribution of different types of frequency high patterns is obviously unbalanced is fully solved, and the sample enhancement is realized by means of downsampling and superposition of different types of random noise; the sample enhancement method does not destroy the basic trend of trace lines in the frequency high diagram, but effectively simulates natural phenomena and noise possibly generated by instrument equipment, so that the classification category of the frequency high diagram is ensured to be unchanged;

4. in the sample enhancement method, frequency-high diagram data among 14 years are classified according to manually identified TXT files in 2002-2015, then four types of samples of frequency type FSF, regional type RSF, mixed type MSF and strong region SSF are expanded, each type is expanded to 20000 samples, 10 ten thousand samples are added in the type without the expansion F, training sets and verification sets are divided according to the proportion of 8:2, and finally classification accuracy exceeding 93% is obtained on three models of ResNet34_20_5_100 (Resnet 34 Net) (93.20%), res Net34-modified-20-2-200 (93.50%), residual_provided_net_old_old_25_5 (last_model_92_sgd_25_5. Pkl) (93.53%). The sample enhancement algorithm used by the method is fully proved, the training requirement of the classification model can be fully met, and a foundation is laid for obtaining good model classification precision.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

Fig. 1 is a high frequency chart at 20050104154500 in accordance with the present invention;

fig. 2 is a high frequency chart at 20050116124500 in accordance with the present invention;

fig. 3 illustrates an original frequency Gao Tu of the present invention taken as an example at a time 20060101021500;

FIG. 4 is a frequency chart of the random Gaussian method of FIG. 3;

fig. 5 is a high frequency chart of fig. 3 after being treated by a random salt-and-pepper method;

fig. 6 is a high frequency chart of fig. 3 after treatment by a salt-and-pepper method;

FIG. 7 is a high frequency chart of FIG. 3 after treatment by the salt and pepper method;

FIG. 8 is a frequency chart of the Gaussian process of FIG. 3;

fig. 9 is a frequency chart of fig. 3 after processing by column Gao Sifa.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that, while the present embodiment provides a detailed implementation and a specific operation process on the premise of the present technical solution, the protection scope of the present invention is not limited to the present embodiment.

The invention comprises the following steps:

preferably, the step S2 specifically includes the following steps:

in this embodiment, the frequency chart generated in the Hainan altimeter 2002-2015 is manually interpreted and corrected for a plurality of times, and according to the international convention mode, the interpretation result is recorded as an example, and the result of manual interpretation classification in 2011 is described:

table 1 is a chart of Hainan altimeter 2011 for manual interpretation of annual frequency chart classification

The first column of the TXT file shown in table 1 records the date of occurrence of the extended F phenomenon in terms of the annual product date, for example 049 represents the 49 th day of 2011, that is, the 18 th day of 2 months of 2011. And so on for the latter three columns, the fifth column representing the type of extension F during this time period.

preferably, the set value in step S221 is 35040 =24×4×365;

the first time resolution is 15 minutes, the second time resolution is 5 minutes (the first period of meridian engineering implemented by the central lead of the space center of the department of Chinese academy, the digital altimeter deployed in the Hainan station basically keeps the sampling resolution at 15 minutes, but the later period meets the requirements of the business department, and the condition that the sampling resolution of 5 minutes and the sampling resolution of 15 minutes coexist can occur in certain periods of certain years).

Preferably, in step S224:

preferably, in step S225, this occurs due to file loss:

when the TXT identification file is traversed row by row, if the starting position of a certain row has a corresponding frequency high diagram, and the ending position lacks the corresponding frequency high diagram, locking the marking range of the category in the number ofFlag frequency high diagrams behind the starting position, and judging whether the time of the file name of the frequency high diagram is smaller than the time of the ending position. If the frequency high graph is smaller than or equal to the classification category of the corresponding row, marking the frequency high graph as the classification category of the corresponding row; if the number is greater than, marking is not carried out, and the cycle is exited;

Preferably, the data enhancement specifically includes the following steps:

The statistical results of classifying the frequency chart generated in Hainan altimeter 2002-2015 in this embodiment are shown in the following table 2:

table 2 is a chart of statistical results of frequency chart generated by Hainan altimeter 2002-2015

As can be seen from the above table, the number ratio of the frequency-high graphs of the five categories in the whole image is: 91.64%, 2.41%, 1.12%, 3.24% and 1.60%. It is apparent that there is a significant imbalance in the sample data between the classes. In order to develop a supervised learning recognition method of the ionosphere frequency high-graph expansion F phenomenon, a sample library which is uniformly distributed and basically consistent in the number of various types of samples is an important guarantee for ensuring rapid convergence during model training before deep neural network training. Therefore, before training of the deep learning-based frequency-high graph automatic recognition model, the above-described problem of significantly unbalanced distribution of statistical frequency-high graphs must be solved.

Fig. 1 is a high frequency chart at 20050104154500 in accordance with the present invention; fig. 2 is a frequency chart at the moment 20050116124500, wherein the noise display is more remarkable in the frame, and as shown in fig. 1 and fig. 2, the frequency chart between the years 2002 and 2015 can be classified year by year through the analysis of the first step manual identification file. The statistical results of the classification are shown in table 2. In the 14 years, the numbers of the frequency-height diagrams of the five categories are respectively as follows: 426555, 11226, 5201, 15083 and 7428. In deep neural network training, the magnitude difference of each class of samples needs to be considered as much as possible. Such as: downsampling is performed without an extended sample, which is acceptable because the image without an extension has a relatively certain similarity; the regional RSF and the strong regional SSF should be respectively extended to 400% and 300% of the original data amount, and when the samples are extended, the sample number should be increased as much as possible, and the repetition rate of similar or related samples should be reduced as much as possible, so as to ensure the final classification accuracy of the model, and therefore, 20000 is finally determined as the sample number of the final sample library of each category.

Meanwhile, the drawing principle of the frequency chart shows that the ionosphere is fed back and displayed by radar waves with different frequencies at 0-17MHz, and different forms correspond to different physical laws. Therefore, the sample enhancement of the frequency-high graph cannot destroy the matching relationship between the frequency (abscissa) and the height (ordinate), otherwise, the feature carrying the physical phenomenon in the image is destroyed, and the interpretation of later scientific researchers is affected. However, conventional image data enhancement methods, including but not limited to rotation, shearing, stretching, etc., may cause unacceptable damage to the trace lines reflecting the change of electron density with height, so that professionals cannot recognize the image after sample enhancement at all.

After the study on the frequency high diagram of the Hainan station for 14 years, the diffuse reflection of the plasma fine structure on the incident electric wave is found, and the diffuse reflection is obviously dispersed in the frequency high diagram. In addition, sometimes, the altimeter itself is unstable, so that abnormal longitudinal or transverse noise points appear in the frequency high-frequency chart. Therefore, aiming at the sample enhancement requirement that the ionosphere frequency high-image category distribution is obviously unbalanced, the traditional image sample enhancement method cannot be adopted, but diffuse reflection caused by a plasma fine structure to an incident electric wave is simulated, or the machine is unstable, and the number of samples of the same type can be increased under the condition that the frequency high-image category is not changed by artificially increasing the dispersion distribution or noise points in the frequency high-image.

Fig. 3 illustrates an original frequency Gao Tu of the present invention taken as an example at a time 20060101021500; FIG. 4 is a frequency chart of the random Gaussian method of FIG. 3; fig. 5 is a high frequency chart of fig. 3 after being treated by a random salt-and-pepper method; fig. 6 is a high frequency chart of fig. 3 after treatment by a salt-and-pepper method; FIG. 7 is a high frequency chart of FIG. 3 after treatment by the salt and pepper method; FIG. 8 is a frequency chart of the Gaussian process of FIG. 3; fig. 9 is a high frequency chart of fig. 3 after being processed by a column Gao Sifa, and as shown in fig. 3-9, the preferred noise method at least comprises a gaussian method and a spiced salt method; gaussian methods include random gaussian, and column Gao Sifa; the spiced salt method comprises a random spiced salt method, a row spiced salt method and a column spiced salt method; in the aspect of super parameters, the noise density parameters of a Gaussian method and a spiced salt method are adjustable; the noise width of the Gaussian method, the row Gao Sifa and the line and line salt-and-pepper method is adjustable; the noise coverage area of the random Gaussian method and the random spiced salt method is adjustable.

Therefore, the method for constructing the ionosphere frequency-high graph classification sample library is adopted, and firstly, the matching relationship between the manual identification and the frequency-high graph is established. The recording mode of the manual mark on the year, the month, the day, the time, the minute and the second is modified into the recording mode of the manual mark on the year, the month, the day, the time, the minute and the second, and then the association relation between the mark and the related frequency high graph is established through a robust comparison method; secondly, through analysis of the statistics of the frequency-high graph for many years, in order to establish a relatively balanced image library of class samples, data enhancement must be performed on the existing classes. The physical significance of the frequency-high diagram determines that the sample expansion method adopting general rotation and clipping cannot be simple and rough. The data enhancement of the sample must follow the physical meaning carried by the frequency-high diagram. Through image analysis and discussion of related scientific researchers, diffuse reflection of a plasma fine structure on an incident electric wave and different types of noise of a frequency-height diagram can be caused by reasons of working voltage, local short-time weather and the like of the altimeter, so that the scheme simulates the influence of natural phenomena on the ionosphere frequency-height diagram by introducing different types of random noise, thereby expanding sample data of various frequency-height diagram expansion F, and finally establishing a relatively balanced frequency-height diagram library. The effectiveness of the sample enhancement algorithm provided by the scheme is also verified by the subsequent model training result.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. The method for constructing the ionospheric frequency high-image classification sample library is characterized by comprising the following steps of: the method comprises the following steps:

the step S2 specifically comprises the following steps:

s23, unifying drawing time of the frequency high-frequency image and manually judging frequency high-frequency image time in the TXT identification file, and determining the category of the frequency high-frequency image according to records in the TXT;

s4, constructing a relatively balanced sample library of each class of samples through an up-down sampling mechanism;

in the step S4, taking the order difference of the frequency-high image of each category into consideration, carrying out equalization operation by introducing different types of noise, and downsampling the frequency-high image of the category with the number higher than the set sample size; the data enhancement is carried out on the frequency high graph of the categories with the quantity lower than the set sample quantity;

the data enhancement specifically comprises the following steps:

2. The method for constructing an ionospheric frequency high-resolution classification sample library according to claim 1, wherein: in step S21, the drawing time format of the intermediate-frequency high-level image is year, month, day, minute, second, and the initial classification is set to 0.

3. The method for constructing the ionospheric frequency high-resolution classification sample library according to claim 2, wherein: the first column of the TXT identification file in step S22 is to record the date when the expansion F phenomenon occurs in a form of a yen product day, the second column is the start time when the expansion F phenomenon occurs recorded in a twenty-four time method, the third column is the deadline when the expansion F phenomenon occurs recorded in a twenty-four time method, and the fourth column is the duration when the expansion F occurs, wherein the first two bits of the fourth column are hour bits and the second two bits are minute bits;

4. A method for constructing an ionospheric frequency high-resolution classification sample library according to claim 3, wherein: the set value in step S221 is 35040 =24×4×365;

5. The method for constructing ionospheric frequency high-resolution classification sample library according to claim 4, wherein: in step S224:

6. A method for constructing an ionospheric frequency high-resolution classification sample library according to claim 3, wherein: in step S225, this occurs due to file loss:

if the starting position of the row lacks the corresponding frequency high map and the ending position has the corresponding frequency high map, locking the marking range of the category in the numberofFlag frequency high maps before the ending position, and if the time division of the file name of the frequency high map is greater than the time division of the starting position in the year, month, day and minute, and if the time division is greater than or equal to the time division of the starting position, marking the frequency high map as the category of the corresponding row; if the number is less than the preset number, marking is not carried out, and the cycle is exited;

7. The method for constructing an ionospheric frequency high-resolution classification sample library according to claim 1, wherein: the noise method at least comprises a Gaussian method and a spiced salt method;

gaussian methods include random gaussian, and column Gao Sifa;