CN117809161A - Star-masking refractive index profile data quality evaluation system based on multi-source data - Google Patents

Star-masking refractive index profile data quality evaluation system based on multi-source data Download PDF

Info

Publication number
CN117809161A
CN117809161A CN202311745625.6A CN202311745625A CN117809161A CN 117809161 A CN117809161 A CN 117809161A CN 202311745625 A CN202311745625 A CN 202311745625A CN 117809161 A CN117809161 A CN 117809161A
Authority
CN
China
Prior art keywords
data
point
refractive index
target
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311745625.6A
Other languages
Chinese (zh)
Inventor
唐琪
柳聪亮
唐跃川
张尧
陈林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Tianmu Chongqing Satellite Technology Co ltd
Original Assignee
Aerospace Tianmu Chongqing Satellite Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Tianmu Chongqing Satellite Technology Co ltd filed Critical Aerospace Tianmu Chongqing Satellite Technology Co ltd
Priority to CN202311745625.6A priority Critical patent/CN117809161A/en
Publication of CN117809161A publication Critical patent/CN117809161A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to the technical field of data processing, in particular to a star-occultation refractive index profile data quality evaluation system based on multi-source data. According to the invention, the credible data points are accurately detected by adjusting the distance between the data points and the size of the self-adaptive neighborhood, and the credible points are used for calculating the quality evaluation value, so that the credibility of the data quality evaluation result of the mask star refractive index profile is improved.

Description

Star-masking refractive index profile data quality evaluation system based on multi-source data
Technical Field
The invention relates to the technical field of data processing, in particular to a occultation refractive index profile data quality evaluation system based on multi-source data.
Background
Occultation observation of the global navigation satellite system is a main earth atmosphere detection means, and has been widely applied to weather forecast and global climate change analysis. Whereas refractive index profile data refers to refractive index information at different heights in the atmosphere layer, typically data comprising three dimensions of time, height and refractive index. The data has important significance for meteorological research and atmospheric science, and can be used for predicting weather, researching atmospheric fluctuation, turbulence and other phenomena. Because part of error, inconsistent or missing data possibly exists in the acquired occultation refractive index profile data based on the multi-source data to influence the subsequent data analysis, data cleaning and abnormal value removal are needed, and the abnormal data detection is carried out by using a local outlier factor algorithm at present to finish data cleaning.
The existing problems are as follows: due to the randomness of erroneous, inconsistent or missing data in the acquired occultation refractive index profile data, when the distance between data points in the local outlier factor algorithm is set and the selection of the neighborhood size is improper, the accuracy of abnormal data detection can be reduced, so that abnormal data still exist after the data are cleaned, and the credibility of the occultation refractive index profile data quality evaluation result can be reduced when the data quality evaluation is carried out.
Disclosure of Invention
The invention provides a occultation refractive index profile data quality evaluation system based on multi-source data, which aims to solve the existing problems.
The invention discloses a star-masking refractive index profile data quality evaluation system based on multi-source data, which adopts the following technical scheme:
one embodiment of the invention provides a occultation refractive index profile data quality evaluation system based on multi-source data, which comprises the following modules:
and a data acquisition module: the method comprises the steps of acquiring refractive index profile information and atmospheric observation data of an atmospheric layer to respectively obtain a occultation refractive index profile data set and an atmospheric observation data set; performing refractive index profile simulation prediction on the atmospheric observation data set to obtain a comparison refractive index profile data set; the mask star refractive index profile data set and the contrast refractive index profile data set comprise a plurality of data points which are in one-to-one correspondence, each data point corresponds to a three-dimensional coordinate and a refractive index, and the three-dimensional coordinates are altitude, longitude and latitude;
a neighborhood size selection module: the method comprises the steps of obtaining corrected distances among data points according to distances among the data points and the refractive indexes of the data points in a occultation refractive index profile data set; recording a plane formed by all data points at the same height in the occultation refractive index profile data set as a reference plane; on adjacent reference planes, obtaining the neighborhood weight of each data point according to the correction distance between the data points; obtaining a neighborhood size sequence of each data point according to the neighborhood weight of each data point on each reference plane;
trusted point screening module: the local outlier factor value of each data point is obtained according to the neighborhood size sequence of each data point on each reference plane; on each reference plane, screening out a plurality of credible points according to the local outlier factor value of each data point;
and a data quality evaluation module: the method is used for obtaining quality evaluation values of the occultation refractive index profile data set according to refractive index differences of the credible points and the data points on the same three-dimensional coordinates of the credible points in the comparison refractive index profile data set.
Further, in the occultation refractive index profile dataset, obtaining a corrected distance between data points according to the distance between data points and the refractive index of the data points comprises:
in the occultation refractive index profile data set, marking any two data points as main data points and fractional data points respectively;
and obtaining the corrected distance between the main data point and the sub data point according to the distance between the main data point and the sub data point and the refractive index difference.
Further, the specific calculation formula for obtaining the correction distance between the main data point and the sub data point according to the distance between the main data point and the sub data point and the refractive index difference is as follows:
wherein L is the corrected distance between the main data point and the fractional data point, A is the distance between the main data point and the fractional data point, B 1 Refractive index of main data point, B 2 Refractive index is a fractional data point.
Further, on the adjacent reference planes, obtaining the neighborhood weight of each data point according to the corrected distance between the data points includes:
of all the reference planes, any one reference plane is recorded as a target plane; two reference planes adjacent to the target plane are respectively marked as a main plane and a sub-plane;
marking any data point in the target plane as a target point;
on the main plane, the data point with the smallest distance from the target point is recorded as a main contrast point of the target point;
on the facet, the data point with the smallest distance from the target point is recorded as a sub-comparison point of the target point;
all data points and target points adjacent to the target point on the target plane are marked as reference points;
and obtaining the neighborhood weight of the target point according to the corrected distances between the target point and the reference point and the main contrast point and the sub-contrast point of the target point.
Further, the specific calculation formula corresponding to the neighborhood weight of the target point is obtained according to the corrected distances between the target point and the reference point and between the main reference point and the sub-reference points, and is as follows:
wherein W is the neighborhood weight of the target point, L' 1 For the correction distance between the target point and its main contrast point, L' 2 For the correction distance between the target point and its sub-control point C x C is the distance ratio of the x-th reference point avg For the average of the distance ratios of all the reference points, N is the number of reference points,for the correction distance between the xth reference point and its main control point,/for the reference point>For the correction distance between the xth reference point and its sub-control point, norm () is a linear normalization function, and || is an absolute value function.
Further, the obtaining the neighborhood size sequence of each data point according to the neighborhood weight of each data point on each reference plane includes:
calculating the product of the neighborhood weight of the target point and the preset maximum neighborhood size, and recording the upward rounding value of the product as the maximum neighborhood size of the target point;
starting from the preset minimum neighborhood size, carrying out iteration of increasing 1, and sequentially counting the neighborhood size after each iteration until reaching the maximum neighborhood size of the target point, thereby obtaining a neighborhood size sequence of the target point.
Further, the obtaining the local outlier factor value of each data point according to the neighborhood size sequence of each data point on each reference plane includes:
recording any one neighborhood size in the neighborhood size sequence of the target point as a reference neighborhood size;
calculating the target point on the target plane by using a local outlier factor algorithm according to the reference neighborhood size to obtain the local reachable density of the target point under the reference neighborhood size;
in the neighborhood size sequence of the target point, counting the local reachable density of the target point under each neighborhood size in sequence to obtain a local reachable density sequence of the target point;
in the local reachable density sequence of the target point, the difference value of the (i+1) th data minus the (i) th data is recorded as the slope of the (i) th data;
obtaining the credibility of each data in the local reachable density sequence of the target point according to the gradient difference of adjacent data in the local reachable density sequence of the target point;
recording data with the credibility larger than a preset judgment threshold value in the local reachable density sequence of the target point as the optimal local reachable density of the target point;
and calculating the target point by using a local outlier factor algorithm on the target plane according to the optimal local reachable density of the target point to obtain a local outlier factor value of the target point.
Further, according to the slope difference of adjacent data in the local reachable density sequence of the target point, a specific calculation formula corresponding to the credibility of each data in the local reachable density sequence of the target point is obtained as follows:
D i =Norm(|F i -F i-1 |+|F i -F i+1 |)
wherein D is i Is the credibility of the ith data in the local reachable density sequence of the target point, F i-1 Slope of the i-1 st data in the sequence of local reachable densities for the target point, F i Slope of ith data in local reachable density sequence for target point, F i+1 For the slope of the (i+1) th data in the local reachable density sequence of the target point, norm () is a linear normalization function, and || is an absolute value function.
Further, on each reference plane, the screening a plurality of trusted points according to the local outlier factor value of each data point includes:
normalizing the local outlier factor values of all the data points on the target plane by using a minimum maximum normalization method to obtain a normalized value of the local outlier factor value of each data point on the target plane;
and recording the data points with the normalized values of the local outlier factor values on the target plane smaller than the preset abnormal threshold as trusted points.
Further, in the comparison refractive index profile data set, obtaining the quality evaluation value of the occultation refractive index profile data set according to the refractive index difference between the trusted point and the data point on the same three-dimensional coordinate of the trusted point comprises:
any one credibility point on the target plane is marked as a main credibility point;
in the contrast refractive index profile data set, data points on three-dimensional coordinates of the main trusted point are marked as main contrast points;
the difference value of the refractive index of the main credible point minus the refractive index of the main contrast point is recorded as the error value of the main credible point;
the variance of the error values of all the trusted points on all the reference planes is recorded as a quality assessment of the occultation refractive index profile dataset.
The technical scheme of the invention has the beneficial effects that:
in the embodiment of the invention, a occultation refractive index profile data set and a contrast refractive index profile data set are obtained, and in the occultation refractive index profile data set, the correction distance between data points is obtained, and the accuracy of abnormal data point detection is improved by adjusting the distance between the data points. And obtaining the neighborhood weight of each data point so as to obtain a neighborhood size sequence of each data point, and obtaining a local outlier factor value of each data point so as to screen a plurality of credible points. And in the contrast refractive index profile data set, obtaining a quality evaluation value of the occultation refractive index profile data set according to the refractive index difference between the credible point and the data point on the same three-dimensional coordinate. The method and the device accurately detect the credible data points by adjusting the distance between the data points and the size of the self-adaptive neighborhood, and calculate the quality evaluation value by using the credible points, thereby improving the credibility of the data quality evaluation result of the occultation refractive index profile.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block flow diagram of a occultation refractive index profile data quality assessment system based on multi-source data according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of a multi-source data-based occultation refractive index profile data quality evaluation system according to the invention, which is specific to implementation, structure, characteristics and effects thereof, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of a occultation refractive index profile data quality evaluation system based on multi-source data.
Referring to fig. 1, a block flow diagram of a system for evaluating quality of occultation refractive index profile data based on multi-source data according to an embodiment of the present invention is shown, the system includes the following blocks:
module 101: and a data acquisition module.
The module is configured to obtain a occultation refractive index profile data set and a reference refractive index profile data set, and one possible implementation manner of obtaining the occultation refractive index profile data set and the reference refractive index profile data set is as follows:
at any moment, using a occultation technique, collecting refractive index profile information of an atmospheric layer in any area to obtain a occultation refractive index profile data set. Each data point in the occultation refractive index profile data set corresponds to a three-dimensional coordinate and a refractive index, and the three-dimensional coordinate is an altitude, a longitude and a latitude.
What needs to be described is: the star masking technique is a process of observing that a celestial body is shielded by the earth atmosphere in the atmosphere, and can acquire refractive index profile information of the atmosphere. The star masking technique is a well-known technique, and a specific method is not described herein.
And according to the data acquisition time and place of the occultation refractive index profile data set, using a weather station to observe at the same time, and acquiring the atmospheric observation data of the atmosphere in the same area to obtain an atmospheric observation data set.
And according to the atmospheric observation data set, performing refractive index profile simulation prediction by using a standard atmospheric model to obtain a comparison refractive index profile data set. Wherein each data point in the contrast refractive index profile data set corresponds to a three-dimensional coordinate and a refractive index, and the three-dimensional coordinate is an altitude, a longitude and a latitude.
What needs to be described is: weather station observations and standard atmosphere models are well known and specific methods are not described herein. The atmospheric observation data comprise temperature, humidity, pressure and the like, and the standard atmospheric model can calculate the refractive index profile through the atmospheric observation data. The acquisition time corresponding to the atmospheric observation data set and the occultation refractive index profile data set is the same as the acquired atmospheric layer area, so that the occultation refractive index profile data set and the data points in the contrast refractive index profile data set correspond to each other one by one according to the three-dimensional coordinates of the data points.
The present embodiment is thus used to evaluate whether the data collected by the occultation technique is accurate, with the simulated and predicted comparative refractive index profile data set as a standard. However, some data which are incorrect and inconsistent may exist in the data acquisition, which affects the evaluation of the accuracy of the data, so that the data needs to be cleaned, the accurate data acquired by the star masking technology is reserved, and the difference between the accurate data and the simulated predicted data is analyzed, so as to judge whether the data acquired by the star masking technology is accurate or not.
Module 102: and a neighborhood size selection module.
The module is used for obtaining the correction distance between the data points in the occultation refractive index profile data set according to the distance between the data points and the refractive index of the data points. The plane formed by all data points at the same height in the occultation refractive index profile dataset is recorded as a reference plane. And on the adjacent reference planes, obtaining the neighborhood weight of each data point according to the correction distance between the data points. And obtaining a neighborhood size sequence of each data point according to the neighborhood weight of each data point on each reference plane.
According to the correction distance between data points and the neighborhood weight of the data points, a neighborhood size sequence is obtained, the known neighborhood size is a main parameter of a local outlier factor algorithm, the neighborhood size sequence is obtained for self-adapting the neighborhood size selection range, and the most suitable neighborhood size in the neighborhood size sequence can be conveniently selected subsequently. One possible implementation of the neighborhood size sequence is as follows:
the known occultation refractive index lane line data means that signal data can generate refraction phenomenon when passing through the atmosphere, and the data such as temperature, humidity, dust and the like in the atmosphere can be detected according to the change of the refractive index of the atmosphere. The refractive index is affected by the difference of Hua Jing where the atmosphere is analyzed, and the refractive index is different when the content of the substance information is different due to the difference of the heights of the atmosphere. When the atmospheric substances are analyzed through the refractive index, one area is detected, and the occultation refractive index lane line data detection is carried out for different positions in the one area.
In the occultation refractive index profile data set, the difference between the refractive indexes of the data at different heights is larger, so that the embodiment sequentially analyzes the data at the same height to identify abnormal data. Under normal conditions, the change condition of the atmosphere is slower, and meanwhile, the difference of substance information in a local range is not obvious, so that the difference of refractive index change is not large, and when the local outlier factor value of the data collected in the range is higher, the probability that the data point is possibly abnormal data can be indicated to be larger.
Refractive index data acquired at the same height is used. When judging the data collected in the same area, the situation that the collected sample data are similar needs to be determined, and abnormal detection inaccuracy of data points caused by discrete situations of the data due to position change is avoided. When the sample set of the data with the same height is divided, the data can be divided through a local outlier factor algorithm, the acquired atmospheric refractive index fluctuation with the same height, which is similar in position, is known according to the logic analysis, and the regional division is carried out according to the refractive index similarity condition in the range, so that an accurate local outlier factor value is obtained.
In the occultation refractive index profile data set, the distribution of the collected data points is uniform, namely the distances among the data points are similar, but the refractive index among the collected data points can generate larger fluctuation condition due to the change of the atmosphere, if the distances among the data points are directly used, the calculation of the local outlier factor algorithm can influence the accuracy and the efficiency of the algorithm, so the embodiment combines the refractive index difference among the data points to adjust the distance among the data points.
Any two data points are respectively marked as a main data point and a fractional data point in the occultation refractive index profile data set.
The calculation formula of the correction distance L between the main data point and the fractional data point is known as follows:
wherein L is the corrected distance between the main data point and the fractional data point, A is the distance between the main data point and the fractional data point, B 1 Refractive index of main data point, B 2 Refractive index is a fractional data point.
What needs to be described is: and the distance A between the main data point and the sub data point is obtained according to the three-dimensional coordinates of the main data point and the sub data point. So when a is larger, the larger the difference in refractive index between the main data point and the sub data point is, the larger the correction distance L between the main data point and the sub data point should be.
According to the mode, the correction distance between two data points in the occultation refractive index profile data set is obtained.
The neighborhood size in the local outlier algorithm is known to be an important parameter that affects the output outcome of the algorithm. If the neighborhood is too small, the algorithm may not reflect the density of the area around the data point, and thus some normal points may be misjudged as outliers. Conversely, if the neighborhood is too large, the algorithm will be too smooth for the density estimation of the data points, ignoring local density variations, and thus may misjudge some outliers as normal points.
Therefore, the minimum neighborhood size K preset in the embodiment min 20, a preset maximum neighborhood size K max 500, for example, other values may be provided in other embodiments, and the present example is not limited thereto. In this range, the computational effect of each data point at different neighborhood sizes is analyzed to determine the optimal neighborhood size for each data point.
Since the difference between the refractive indexes of the data at different heights is known to be large, the data at the same height is analyzed in turn in this embodiment, and a plane formed by all the data points at the same height in the occultation refractive index profile data set is recorded as a reference plane. Thus, a plurality of reference planes are obtained, namely, the refractive index of the data points on each reference plane is acquired under the same atmosphere height.
The fluctuation conditions of the atmospheric substances in different reference planes are different, the value ranges of the neighborhood sizes are also different, and the value ranges of the neighborhood sizes are adjusted according to the fluctuation conditions of the adjacent reference planes, so that the method can be more in line with the actual regional division conditions.
Of all the reference planes, any one is noted as a target plane. Two reference planes adjacent to the target plane are denoted as a main plane and a sub-plane, respectively.
What needs to be described is: all the reference planes are sequentially ordered along the height direction of the three-dimensional coordinates of the data points, namely, the main plane and the sub-plane are the reference planes of the target plane above and below the height direction, and when the target plane has only one adjacent reference plane, the target plane and the unique adjacent reference plane are respectively the main plane and the sub-plane.
Any data point in the target plane is recorded as a target point.
On the principal plane, the one data point with the smallest distance from the target point is recorded as the principal reference point of the target point.
On the facet, the one data point with the smallest distance from the target point is recorded as the sub-control point of the target point.
What needs to be described is: the main control point and the sub control point should be unique, and when a plurality of data points with the smallest distance to the target point exist, one can be selected as the main control point or the sub control point at will due to the same distance.
In the above manner, a main control point and a sub-control point for each data point in the target plane are obtained.
All data points and target points adjacent to the target point in the target plane are noted as reference points.
The calculation formula of the neighborhood weight W of the target point is known as follows:
wherein W is the neighborhood weight of the target point, L' 1 For the correction distance between the target point and its main contrast point, L' 2 For the correction distance between the target point and its sub-control point C x C is the distance ratio of the x-th reference point avg For the average of the distance ratios of all the reference points, N is the number of reference points,for the correction distance between the xth reference point and its main control point,/for the reference point>For the correction distance between the xth reference point and its sub-control point, norm () is a linear normalization function, normalize the data value to [0,1]Within the interval. I is an absolute function.
What needs to be described is: the heights of the data points in the same reference plane are the same, so that all the reference points correspond toShould be similar, thus->When the value of (2) is larger, the data change condition in the local area where the target point is located is more unreliable, the smaller neighborhood size is needed, the detection accuracy of abnormal data points is ensured, and when |L '' 1 -L′ 2 And when the I is larger, the influence of the height difference on the refractive index of the data point is larger, namely the target point is in a region with more severe atmospheric fluctuation, and the smaller the neighborhood size is needed, so that the accuracy of the subsequent data anomaly analysis is ensured. Therefore use->Representing the neighborhood weight of the target point.
Calculating the neighborhood weight W and the preset maximum neighborhood size K of the target point max And (3) marking the upward rounding value of the product as the maximum neighborhood size K of the target point.
What needs to be described is: if the maximum neighborhood size K of the target point is smaller than or equal to the preset minimum neighborhood size K min Then the preset maximum neighborhood size K is adjusted max Making the maximum neighborhood size K of the target point be greater than the preset minimum neighborhood size K min
From a preset minimum neighborhood size K min Starting, carrying out iteration of adding 1 and increasing, sequentially counting the neighborhood size after each iteration until reaching the maximum neighborhood size of the target point, and obtaining a neighborhood size sequence { K ] of the target point min ,K min +1,K min +2,...,K}。
Module 103: and a trusted point screening module.
The module is used for obtaining a local outlier factor value of each data point according to the neighborhood size sequence of each data point on each reference plane. On each reference plane, a plurality of credible points are screened out according to the local outlier factor value of each data point.
Therefore, data points with higher credibility are screened out from the occultation refractive index profile data set to serve as credible points, and the credible points are used for carrying out subsequent data quality assessment, so that accuracy of the data quality assessment is guaranteed. One possible implementation of screening trusted points is as follows:
and recording any one neighborhood size in the neighborhood size sequence of the target point as a reference neighborhood size.
And calculating the target point on the target plane by using a local outlier factor algorithm according to the reference neighborhood size to obtain the local reachable density of the target point under the reference neighborhood size.
What needs to be described is: the local outlier factor algorithm is a well-known technique, and the specific method is not described here. The neighborhood size is the main parameter of the local outlier factor algorithm, and the local reachable density is an important index calculated in the operation process of the local outlier factor algorithm.
According to the mode, the local reachable density of the target point under each neighborhood size is obtained in the neighborhood size sequence of the target point.
And in the neighborhood size sequence of the target point, counting the local reachable density of the target point under each neighborhood size in sequence to obtain the local reachable density sequence of the target point.
In the local reachable density sequence of the target point, the difference value of the (i+1) th data minus the (i) th data is recorded as the slope of the (i) th data.
In the above manner, the slope of each data in the local reachable density sequence of the target point is obtained.
What needs to be described is: in the local reachable density sequence of the target point, the last data has no slope.
Taking the ith data in the local reachable density sequence of the target point as an example, it can be seen thatConfidence D of the ith data in the local reachable density sequence of the target point i The calculation formula of (2) is as follows:
D i =Norm(|F i -F i-1 |+|F i -F i+1 |)
wherein D is i Is the credibility of the ith data in the local reachable density sequence of the target point, F i-1 Slope of the i-1 st data in the sequence of local reachable densities for the target point, F i Slope of ith data in local reachable density sequence for target point, F i+1 For the slope of the (i+1) th data in the local reachable density sequence of the target point, norm () is a linear normalization function, normalizing the data value to [0,1]]Within the interval. I is an absolute function.
What needs to be described is: when |F i -F i-1 The larger the i, the larger the fluctuation of the i-th data and the data adjacent before, the i F i -F i+1 The larger the i, the larger the fluctuation between the i data and the data adjacent to the i data, the more uniform the change in the density of the data points when the difference in the refractive index between the data points is known to be, the larger the difference in the refractive index when the substances in the atmosphere are greatly different, the more significant the change in the density is, and the region can be used as a boundary when the region is divided, so that the i F i -F i-1 |+|F i -F i+1 The larger the i, the greater the confidence of the i-th data in the local reachable density sequence of the target point, the more likely it is that the region density changes.
According to the mode, the credibility of each data in the local reachable density sequence of the target point is obtained.
What needs to be described is: in the local reachable density sequence of the target point, the last data has no slope and there is no next data, while the first data has no last data, so the present embodiment does not calculate the credibility of the first, second last and last data, i.e. the following analysis takes these three data into account.
The preset judgment threshold value of the embodiment is 0.7, and the preset abnormality threshold value is 0.75, which is described as an example, but other values may be set in other embodiments, and the embodiment is not limited thereto.
And in the local reachable density sequence of the target point, recording the data with the credibility larger than a preset judgment threshold value as the optimal local reachable density of the target point.
What needs to be described is: when a plurality of data with the credibility larger than a preset judging threshold value exist, taking the forefront data as the optimal local reachable density of the target point in the local reachable density sequence of the target point.
And calculating the target point on the target plane by using a local outlier factor algorithm according to the optimal local reachable density of the target point to obtain a local outlier factor value of the target point.
What needs to be described is: the local outlier factor algorithm firstly obtains the local reachable density of the data point, and then obtains the local outlier factor value of the data point according to the local reachable density of the data point, wherein the local outlier factor value is the final result value of the local outlier factor algorithm. The larger the local outlier is, the more isolated the data point is relative to its neighbors and the greater the difference in density from surrounding data points, so the more abnormal the data point can be considered.
In the manner described above, local outlier values for each data point are obtained on the target plane.
And normalizing the local outlier factor values of all the data points on the target plane by using a minimum maximum normalization method, normalizing the data values to be within the [0,1] interval, and obtaining the normalized values of the local outlier factor values of all the data points on the target plane.
And on the target plane, recording the data points with the normalized values of the local outlier factor values smaller than the preset abnormal threshold as trusted points. Thereby obtaining several trusted points on the target plane.
According to the mode, a plurality of credible points on each reference plane are obtained.
Module 104: and a data quality evaluation module.
The module is used for obtaining a quality evaluation value of the occultation refractive index profile data set according to the refractive index difference between the credible point and the data point on the same three-dimensional coordinate in the comparison refractive index profile data set.
The data quality evaluation is carried out by using the credible points, so that a more accurate quality evaluation value is obtained, and one possible implementation mode of the quality evaluation value is as follows:
and marking any credibility point on the target plane as a main credibility point.
And in the comparison refractive index profile data set, the data point corresponding to the three-dimensional coordinates of the main trusted point is recorded as a main comparison point.
And (3) subtracting the difference value of the refractive index of the main control point from the refractive index of the main trusted point, and recording the difference value as an error value of the main trusted point.
In the above manner, the error value of each trusted point on the target plane and the error value of each trusted point on each reference plane are obtained.
On all reference planes, the variances of the error values of all the trusted points are calculated and recorded as quality assessment values of the occultation refractive index profile dataset.
What needs to be described is: the larger the variance is, the more severe the change between the refractive index of the credible points in the occultation refractive index profile data set and the difference value of the simulated predicted refractive index is, namely, the difference value of the refractive index of all the credible points and the simulated predicted refractive index is disordered, and the worse the quality of the occultation refractive index profile data set is. The larger the quality assessment value of the occultation refractive index profile data set is, the poorer the quality of the occultation refractive index profile data set acquired by the occultation technology is.
The present invention has been completed.
In summary, in the embodiment of the present invention, refractive index profile information and atmospheric observation data of any atmosphere are collected, a occultation refractive index profile data set and a reference refractive index profile data set are obtained, and a correction distance between data points is obtained in the occultation refractive index profile data set, so as to obtain a neighborhood weight of each data point, thereby obtaining a neighborhood size sequence of each data point, so as to obtain a local outlier factor value of each data point, and thus a plurality of trusted points are screened. And in the contrast refractive index profile data set, obtaining a quality evaluation value of the occultation refractive index profile data set according to the refractive index difference between the credible point and the data point on the same three-dimensional coordinate. According to the invention, the credible data points are accurately detected by adjusting the distance between the data points and the size of the self-adaptive neighborhood, and the credible points are used for calculating the quality evaluation value, so that the credibility of the data quality evaluation result of the occultation refractive index profile is improved.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A multi-source data-based occultation refractive index profile data quality assessment system, comprising the following modules:
and a data acquisition module: the method comprises the steps of acquiring refractive index profile information and atmospheric observation data of an atmospheric layer to respectively obtain a occultation refractive index profile data set and an atmospheric observation data set; performing refractive index profile simulation prediction on the atmospheric observation data set to obtain a comparison refractive index profile data set; the mask star refractive index profile data set and the contrast refractive index profile data set comprise a plurality of data points which are in one-to-one correspondence, each data point corresponds to a three-dimensional coordinate and a refractive index, and the three-dimensional coordinates are altitude, longitude and latitude;
a neighborhood size selection module: the method comprises the steps of obtaining corrected distances among data points according to distances among the data points and the refractive indexes of the data points in a occultation refractive index profile data set; recording a plane formed by all data points at the same height in the occultation refractive index profile data set as a reference plane; on adjacent reference planes, obtaining the neighborhood weight of each data point according to the correction distance between the data points; obtaining a neighborhood size sequence of each data point according to the neighborhood weight of each data point on each reference plane;
trusted point screening module: the local outlier factor value of each data point is obtained according to the neighborhood size sequence of each data point on each reference plane; on each reference plane, screening out a plurality of credible points according to the local outlier factor value of each data point;
and a data quality evaluation module: the method is used for obtaining quality evaluation values of the occultation refractive index profile data set according to refractive index differences of the credible points and the data points on the same three-dimensional coordinates of the credible points in the comparison refractive index profile data set.
2. The multi-source data-based occultation index profile data quality assessment system of claim 1, wherein in the occultation index profile data set, obtaining a corrected distance between data points based on a distance between data points and a refractive index of the data points comprises:
in the occultation refractive index profile data set, marking any two data points as main data points and fractional data points respectively;
and obtaining the corrected distance between the main data point and the sub data point according to the distance between the main data point and the sub data point and the refractive index difference.
3. The multi-source data-based occultation refractive index profile data quality evaluation system according to claim 2, wherein the specific calculation formula for obtaining the corrected distance between the main data point and the sub data point according to the distance between the main data point and the sub data point and the refractive index difference is as follows:
wherein L is the corrected distance between the main data point and the fractional data point, A is the distance between the main data point and the fractional data point, B 1 Refractive index of main data point, B 2 Refractive index is a fractional data point.
4. The multi-source data-based occultation index profile data quality assessment system of claim 1, wherein the obtaining neighborhood weights for each data point based on the corrected distances between the data points at adjacent reference planes comprises:
of all the reference planes, any one reference plane is recorded as a target plane; two reference planes adjacent to the target plane are respectively marked as a main plane and a sub-plane;
marking any data point in the target plane as a target point;
on the main plane, the data point with the smallest distance from the target point is recorded as a main contrast point of the target point;
on the facet, the data point with the smallest distance from the target point is recorded as a sub-comparison point of the target point;
all data points and target points adjacent to the target point on the target plane are marked as reference points;
and obtaining the neighborhood weight of the target point according to the corrected distances between the target point and the reference point and the main contrast point and the sub-contrast point of the target point.
5. The system for evaluating the quality of the occultation refractive index profile data based on multi-source data according to claim 4, wherein the specific calculation formula corresponding to the neighborhood weight of the target point is obtained according to the corrected distances between the target point and the reference point and between the main contrast point and the sub contrast point thereof, and is as follows:
wherein W is the neighborhood weight of the target point, L' 1 For the correction distance between the target point and its main contrast point, L' 2 For the correction distance between the target point and its sub-control point C x C is the distance ratio of the x-th reference point avg For the average of the distance ratios of all the reference points, N is the number of reference points,for the correction distance between the xth reference point and its main control point,/for the reference point>For the correction distance between the xth reference point and its sub-control point, norm () is a linear normalization function, and || is an absolute value function.
6. The multi-source data-based occultation index profile data quality assessment system of claim 4, wherein the deriving a neighborhood size sequence for each data point based on neighborhood weights for each data point on each reference plane comprises:
calculating the product of the neighborhood weight of the target point and the preset maximum neighborhood size, and recording the upward rounding value of the product as the maximum neighborhood size of the target point;
starting from the preset minimum neighborhood size, carrying out iteration of increasing 1, and sequentially counting the neighborhood size after each iteration until reaching the maximum neighborhood size of the target point, thereby obtaining a neighborhood size sequence of the target point.
7. The multi-source data-based occultation index profile data quality assessment system of claim 4, wherein obtaining local outlier factor values for each data point based on a neighborhood size sequence for each data point on each reference plane comprises:
recording any one neighborhood size in the neighborhood size sequence of the target point as a reference neighborhood size;
calculating the target point on the target plane by using a local outlier factor algorithm according to the reference neighborhood size to obtain the local reachable density of the target point under the reference neighborhood size;
in the neighborhood size sequence of the target point, counting the local reachable density of the target point under each neighborhood size in sequence to obtain a local reachable density sequence of the target point;
in the local reachable density sequence of the target point, the difference value of the (i+1) th data minus the (i) th data is recorded as the slope of the (i) th data;
obtaining the credibility of each data in the local reachable density sequence of the target point according to the gradient difference of adjacent data in the local reachable density sequence of the target point;
recording data with the credibility larger than a preset judgment threshold value in the local reachable density sequence of the target point as the optimal local reachable density of the target point;
and calculating the target point by using a local outlier factor algorithm on the target plane according to the optimal local reachable density of the target point to obtain a local outlier factor value of the target point.
8. The system for evaluating the quality of the occultation refractive index profile data based on multi-source data according to claim 7, wherein the specific calculation formula corresponding to the credibility of each data in the local reachable density sequence of the target point is obtained according to the gradient difference of the adjacent data in the local reachable density sequence of the target point:
D i =Norm(|F i -F i-1 |+|F i -F i+1 |)
wherein D is i Is the credibility of the ith data in the local reachable density sequence of the target point, F i-1 Slope of the i-1 st data in the sequence of local reachable densities for the target point, F i Slope of ith data in local reachable density sequence for target point, F i+1 For the slope of the (i+1) th data in the local reachable density sequence of the target point, norm () is a linear normalization function, and || is an absolute value function.
9. The multi-source data-based occultation index profile data quality assessment system of claim 4, wherein said screening out a plurality of trusted points based on local outlier values for each data point at each reference plane comprises:
normalizing the local outlier factor values of all the data points on the target plane by using a minimum maximum normalization method to obtain a normalized value of the local outlier factor value of each data point on the target plane;
and recording the data points with the normalized values of the local outlier factor values on the target plane smaller than the preset abnormal threshold as trusted points.
10. The multi-source data-based mask refractive index profile data quality assessment system of claim 4, wherein the obtaining the mask refractive index profile data set quality assessment value from the refractive index difference between the trusted point and the data points on the same three-dimensional coordinates thereof in the reference refractive index profile data set comprises:
any one credibility point on the target plane is marked as a main credibility point;
in the contrast refractive index profile data set, data points on three-dimensional coordinates of the main trusted point are marked as main contrast points;
the difference value of the refractive index of the main credible point minus the refractive index of the main contrast point is recorded as the error value of the main credible point;
the variance of the error values of all the trusted points on all the reference planes is recorded as a quality assessment of the occultation refractive index profile dataset.
CN202311745625.6A 2023-12-19 2023-12-19 Star-masking refractive index profile data quality evaluation system based on multi-source data Pending CN117809161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311745625.6A CN117809161A (en) 2023-12-19 2023-12-19 Star-masking refractive index profile data quality evaluation system based on multi-source data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311745625.6A CN117809161A (en) 2023-12-19 2023-12-19 Star-masking refractive index profile data quality evaluation system based on multi-source data

Publications (1)

Publication Number Publication Date
CN117809161A true CN117809161A (en) 2024-04-02

Family

ID=90424392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311745625.6A Pending CN117809161A (en) 2023-12-19 2023-12-19 Star-masking refractive index profile data quality evaluation system based on multi-source data

Country Status (1)

Country Link
CN (1) CN117809161A (en)

Similar Documents

Publication Publication Date Title
US11488069B2 (en) Method for predicting air quality with aid of machine learning models
CN116167668B (en) BIM-based green energy-saving building construction quality evaluation method and system
KR102169452B1 (en) METHOD FOR ENSURING STABILITY OF DATA COLLECTED IN IoT WEATHER ENVIRONMENT
CN110427993B (en) High-speed train navigation blind area positioning method based on meteorological parameters
CN116413395B (en) Intelligent detection method for environmental abnormality
CN111695473B (en) Tropical cyclone strength objective monitoring method based on long-short-term memory network model
CN113335341A (en) Train positioning system and method based on GNSS and electronic map topological structure
CN115342814B (en) Unmanned ship positioning method based on multi-sensor data fusion
CN103235890A (en) System and method for satellite short-time approaching rainfall forecasting
CN115223063A (en) Unmanned aerial vehicle remote sensing wheat new variety lodging area extraction method and system based on deep learning
CN112349150A (en) Video acquisition method and system for airport flight guarantee time node
CN117075097B (en) Maritime radar target tracking method and system based on expanded target cluster division
CN108415105B (en) Method for inspecting relative humidity value observed by ground meteorological observation station
CN117093832B (en) Data interpolation method and system for air quality data loss
CN117372629A (en) Reservoir visual data supervision control system and method based on digital twinning
CN112614121A (en) Multi-scale small-target equipment defect identification and monitoring method
CN114691661B (en) Assimilation-based cloud air guide and temperature and humidity profile pretreatment analysis method and system
CN117809161A (en) Star-masking refractive index profile data quality evaluation system based on multi-source data
CN115984360B (en) Method and system for calculating length of dry beach based on image processing
CN108182492B (en) Data assimilation method and device
CN116451554A (en) Power grid weather risk prediction method considering multiple weather factors
CN115236772A (en) Data quality control system and method for drifting observation instrument
CN111832548A (en) Train positioning method
CN116467555B (en) Ocean profile observation data quality control method and system
CN116469013B (en) Road ponding prediction method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination