CN112580871B - Feature screening method and device - Google Patents

Feature screening method and device Download PDF

Info

Publication number
CN112580871B
CN112580871B CN202011501528.9A CN202011501528A CN112580871B CN 112580871 B CN112580871 B CN 112580871B CN 202011501528 A CN202011501528 A CN 202011501528A CN 112580871 B CN112580871 B CN 112580871B
Authority
CN
China
Prior art keywords
landslide
grids
important feature
determining
preliminary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011501528.9A
Other languages
Chinese (zh)
Other versions
CN112580871A (en
Inventor
郑增容
吴展开
商琪
江子君
宋杰
胡辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ruhr Technology Co Ltd
Original Assignee
Hangzhou Ruhr Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ruhr Technology Co Ltd filed Critical Hangzhou Ruhr Technology Co Ltd
Priority to CN202011501528.9A priority Critical patent/CN112580871B/en
Publication of CN112580871A publication Critical patent/CN112580871A/en
Application granted granted Critical
Publication of CN112580871B publication Critical patent/CN112580871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a feature screening method and a device thereof. By determining landslide influence factors of landslide grids and non-landslide grids in the current area, screening a preliminary important feature set from the landslide influence factors, determining important features to be supplemented, determining a first bias coefficient and a second bias coefficient successively, determining whether important features to be supplemented are added in the preliminary important feature set according to the first bias coefficient and the second bias coefficient, sequentially determining important target features, screening the important features, and analyzing landslide occurrence conditions according to the important target features pertinently, so that the prediction efficiency and the prediction accuracy of landslide are improved.

Description

Feature screening method and device
Technical Field
The embodiment of the invention relates to the technical field of information processing, in particular to a feature screening method and a device thereof.
Background
Landslide is one of the most common disastrous natural disasters, has the characteristics of wide distribution range, high occurrence frequency, multiple occurrence, regional property, severity and the like, and has great significance in evaluating the susceptibility of landslide because landslide can cause a large amount of casualties and serious environmental and infrastructure losses every year. Landslide imaging factors in each grid of the susceptibility area are important bases for evaluating landslides.
In the prior art, when landslide is estimated by utilizing landslide information of each grid in a vulnerability area, landslide prediction is directly performed based on landslide image factors of each grid. However, landslide impact factors are multidimensional, and some have minimal impact on susceptibility assessment of landslide. Therefore, when landslide prediction is performed based on all landslide influence factors, the calculated amount is large, the landslide prediction efficiency is influenced, and the improvement is needed.
Disclosure of Invention
The invention provides a feature screening method and a device thereof, which realize the effect of extracting important features in landslide influence factors, and conduct landslide prediction based on the important features in a targeted manner so as to improve landslide prediction efficiency.
In a first aspect, an embodiment of the present invention provides a feature screening method, including:
acquiring landslide grids of a current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids;
screening a preliminary important feature set from the landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
Determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important features;
and determining the target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
In a second aspect, an embodiment of the present invention further provides a feature screening apparatus, where the apparatus includes:
the data acquisition module is used for acquiring landslide grids of the current area and non-landslide grids except the landslide grids;
the landslide influence factor determining module is used for determining landslide influence factors of the landslide grids and the non-landslide grids;
the first deviation coefficient determining module is used for screening a preliminary important feature set from the landslide influence factors and determining first deviation coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
the second bias coefficient determining module is used for determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important features;
And the target important feature determining module is used for determining the target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
According to the technical scheme, the landslide influence factors of the landslide grids and the non-landslide grids in the current area are determined, the preliminary important feature set is screened out from the landslide influence factors, the important features to be supplemented are determined, the first bias coefficient and the second bias coefficient can be sequentially determined, whether the important features to be supplemented are added in the preliminary important feature set or not is determined according to the first bias coefficient and the second bias coefficient, so that important features are sequentially determined, screening is achieved, landslide occurrence conditions are analyzed according to the target important features in a targeted mode, and the prediction efficiency and the prediction accuracy of landslide are improved.
Drawings
FIG. 1 is a flow chart of a feature screening method in accordance with a first embodiment of the present invention;
FIG. 2 is a flow chart of a feature screening method in a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a feature screening apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural view of a feature screening apparatus in a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a feature screening method according to an embodiment of the present invention, where the method may be applied to a case of mining important features from a landslide impact factor of a current area, and the method may be performed by a feature screening device, as shown in fig. 1, and includes the following steps:
s110, acquiring landslide grids of the current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids.
The current area is usually an area where landslide occurs, and can be any designated area, including landslide grids and non-landslide grids other than landslide grids. The landslide grids and the non-landslide grids refer to basic units for development of geological disasters such as landslide, collapse and the like, and corresponding attribute values are given to the units to represent a data form of an entity. Landslide grids and non-landslide grids include landslide influencing factors therein, which may include dynamic factors including at least one of rainfall, vegetation coefficient, and soil moisture, and static factors including at least one of elevation, slope direction, planar curvature, profile curvature, terrain moisture index, water flow intensity index, sediment transport index, terrain roughness index, fault distance, river distance, road distance, lithology, land utilization, and vegetation coverage.
S120, screening a preliminary important feature set from landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set.
The preliminary set of important features may include, among other things, one or more important features in the landslide impact factor, which may be understood as key features of landslide prediction, such as slope, slope direction, and rainfall. Optionally, the method for determining the preliminary important feature set includes: calculating a first distribution distance of the landslide grids and the non-landslide grids based on the landslide impact factor; and screening important features from the landslide influence factors according to the first distribution distance to construct an important feature set, and screening a preliminary important feature set with preset proportion from the important feature set. The step of screening important features from the landslide influence factors according to the first distribution distance to construct an important feature set comprises the following steps: comparing the first distribution distance with a preset score, screening landslide influence factors corresponding to the first distribution distance exceeding the score from landslide influence factors to serve as important features, and constructing an important feature set.
The first distribution distance refers to the similarity between the landslide grids and the non-landslide grids, and the larger the first distribution distance is, the smaller the similarity between the landslide grids and the non-landslide grids is, and the larger the similarity between the landslide grids and the non-landslide grids is. Alternatively, the first distribution distance is calculated based on KL divergence (Kullback-Leible divergence), information divergence (information divergence), information gain (information gain), JS divergence (Jensen-Shannon), wasperstein distance (earth distance), bhattachharyya distance (papanicolaou distance), and the like. The quantile is also called as a point, and refers to dividing the classification probability range of a random variable into a plurality of equally divided numerical points, and the common quantile can be a median, a quartile, a percentile and the like. The preset proportion may be a random proportion or a proportion which is changed continuously based on a set rule.
Specifically, after calculating a first distribution distance of the landslide grids combined with the non-landslide grids, comparing the first distribution distance with a preset score, screening landslide influence factors corresponding to the first distribution distance exceeding the score from landslide influence factors to serve as important features, and constructing an important feature set; further, a preliminary important feature set is screened from the important feature set according to a preset proportion.
Optionally, the first bias coefficient determining method includes: clustering the landslide grids and the non-landslide grids based on the preliminary important feature set, and determining the distribution state of the grids in each clustering category; and if the distribution states of the grids in each cluster category do not accord with normal distribution, calculating a first skewness coefficient of the landslide grids and the non-landslide grids under the preliminary important feature set.
The bias state refers to that acquired data is not symmetrically distributed, data bias can be generated, the data bias comprises left bias and right bias, a bias state coefficient can be calculated according to the acquired data, whether the acquired data has bias or not is determined according to the bias state coefficient, and the bias state coefficient can be a Pier Sun Piantai coefficient, a center moment bias state coefficient and the like. Alternatively, the landslide grids and the non-landslide grids may be clustered based on K-means clustering algorithm (K-means), density-based clustering algorithm (DBSCAN), balanced iteration protocol and clustering algorithm (Balanced Iterative Reducing and Clustering Using Hierarchies, BIRCH for short), and the like, according to one or more important features in the preliminary important feature set.
Specifically, clustering the landslide grids and the non-landslide grids based on the preliminary important feature set, and judging whether the distribution states of the grids in each clustering class accord with normal distribution after determining the distribution states of the grids in each clustering class; if the distribution state of the grids in each cluster category does not accord with normal distribution, the landslide grids and the non-landslide grids are asymmetrically distributed under the primary important feature set, data deviation occurs, and first deviation coefficients of the landslide grids and the non-landslide grids under the primary important feature set are calculated; if the distribution state of the grids in each clustering category accords with normal distribution, which indicates that the important features in the preliminary important feature set do not have a bias trend, the preset proportion is redetermined, the preliminary important feature set is rescreened, the landslide grids and the non-landslide grids are clustered again based on the newly determined preliminary important feature set, the distribution state of the grids in each clustering category is determined until the distribution state does not accord with normal distribution, and the first bias coefficient is calculated.
S130, determining important features to be supplemented in landslide influence factors, updating a preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set.
The important features to be supplemented refer to landslide influence factors which are not in the primary important feature set, and the important features to be supplemented can also be features with larger influence on landslide prediction. For example, the primary important feature set includes two features of gradient and slope direction, and vegetation coverage in the landslide influence factor has a large influence on landslide prediction results. Optionally, the method for determining the important features to be supplemented includes: calculating a second distribution distance of the landslide grids and the non-landslide grids based on the important features to be supplemented; and determining important features to be supplemented in the landslide impact factors according to the order of the second distribution distances from large to small.
The second distribution distance refers to the similarity between the landslide grids and the non-landslide grids, and the larger the second distribution distance is, the smaller the similarity between the landslide grids and the non-landslide grids is. Specifically, the second distribution distance may be calculated based on KL divergence (Kullback-leible divergence), information divergence (information divergence), information gain (information gain), JS divergence (Jensen-Shannon), waserstein distance (earth distance), bhattachharyya distance (barthite distance), and the like, and according to important features to be supplemented; and arranging the features to be supplemented according to the sequence of the second distribution distance from large to small, and determining important features to be supplemented. Further, after the important features to be supplemented are determined, the important features to be supplemented are added into the preliminary important feature set one by one according to the sequence of the second distribution distance from large to small so as to update the preliminary important feature set, and second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set are determined.
And S140, determining target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
Optionally, the method for determining the important target features includes: if the second bias state coefficient is smaller than or equal to the first bias state coefficient, taking the preliminary important feature before updating as the target important feature; and if the second bias state coefficient is larger than the first bias state coefficient, taking the updated preliminary important features as the target important features.
Specifically, the first bias coefficient is taken as the standard bias coefficient, if the second bias coefficient is smaller than or equal to the first bias coefficient, the deviation degree of the updated preliminary important feature set is reduced, namely the influence of the important feature to be supplemented on the landslide prediction result is smaller, and the important feature to be supplemented is not added, so that the preliminary important feature before updating is taken as the target important feature; if the second bias coefficient is larger than the first bias coefficient, the deviation degree of the updated preliminary important feature set is increased, namely the influence of important features to be supplemented on landslide prediction results is larger, so that the updated preliminary important features are used as target important features.
By means of the method, the first deviation coefficient and the second deviation coefficient are sequentially determined, whether important features to be supplemented are added in the preliminary important feature set or not is determined according to the first deviation coefficient and the second deviation coefficient, the important features are sequentially determined, screening is achieved, landslide occurrence conditions are analyzed according to the target important features, and improvement of landslide prediction efficiency and prediction accuracy is facilitated.
In order to further improve the screening precision of the target important features, if the second bias coefficient is larger than the first bias coefficient, adding the important features to be supplemented into the preliminary important feature set, namely updating the preliminary important feature set, and taking the second bias coefficient obtained based on the updated preliminary important feature set as a new standard bias coefficient; and when the updated preliminary feature set is updated again, comparing a new second bias coefficient under the updated preliminary feature set with the new standard bias coefficient, and if the new second bias coefficient is larger than the new standard bias coefficient, taking the updated preliminary feature as a target important feature, otherwise, taking the preliminary feature before the updating again as the target important feature.
The landslide influence factor includes 3 important features to be supplemented, a first bias coefficient under the preliminary important feature set is 0.6, the first bias coefficient is used as a standard, the 1 st important feature to be supplemented is added to the preliminary important feature set, the preliminary important feature set is updated, a second bias coefficient of the updated preliminary important feature set is calculated to be 0.7, the second bias coefficient is larger than the first bias coefficient (namely, the standard bias coefficient), the 1 st important feature to be supplemented is added to the preliminary important feature set, namely, the preliminary important feature set is updated, and the updated preliminary important feature is used as the target important feature; further, traversing the 2 nd and 3 rd important features to be supplemented, taking the second bias state coefficient 0.7 as a new standard bias state coefficient, adding the second important features to be supplemented into the updated preliminary important feature set, calculating the new second bias state coefficient to be 0.75, and if the new second bias state coefficient is larger than the new standard bias state coefficient 0.7, continuing to add the 2 nd important features to be supplemented into the preliminary important feature set, namely updating the preliminary important feature set again, taking the updated preliminary important feature set as the target important feature until the 3 important features to be supplemented are traversed, and determining the target important feature according to the newly determined second bias state coefficient and the new standard bias state coefficient. In this way, the standard deviation coefficient can be dynamically updated, and the screening accuracy of the important target features can be improved based on the dynamically updated standard deviation coefficient.
According to the technical scheme, the landslide influence factors of the landslide grids and the non-landslide grids in the current area are determined, the preliminary important feature set is screened out from the landslide influence factors, the important features to be supplemented are determined, the first bias coefficient and the second bias coefficient can be sequentially determined, whether the important features to be supplemented are added in the preliminary important feature set or not is determined according to the first bias coefficient and the second bias coefficient, so that important features are sequentially determined, screening is achieved, landslide occurrence conditions are analyzed according to the target important features in a targeted mode, and the prediction efficiency and the prediction accuracy of landslide are improved.
Example two
Fig. 2 is a flowchart of a feature screening method according to a second embodiment of the present invention, in which new steps are added on the basis of the previous embodiment. Optionally, the method further comprises: calculating the similarity between the landslide grids and the non-landslide grids based on the target important features, and determining high-risk units in the non-landslide grids based on the similarity; or, discretizing the current region based on each target important feature to obtain a discretized interval corresponding to each target important feature, and determining a high risk interval in each discretized interval; and determining the high-risk units in the non-landslide grids according to the number of the target important features corresponding to each non-landslide grid in the high-risk interval. For parts which are not described in detail in this method embodiment, reference is made to the above-described embodiments. Referring specifically to fig. 2, the method may include the steps of:
S210, acquiring landslide grids of the current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids.
S220, screening the preliminary important feature set from landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set.
S230, determining important features to be supplemented in landslide influence factors, updating a preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set.
S240, determining target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
S250, calculating the similarity between the landslide grids and the non-landslide grids based on the target important features, and determining high-risk units in the non-landslide grids based on the similarity.
The similarity can be understood as the similarity of important target features in the landslide grids and the non-landslide grids. The similarity may be cosine similarity, euclidean distance, correlation coefficient, or the like. Optionally, a similarity threshold is preset, and the similarity is compared with the similarity threshold; if the similarity is greater than the similarity threshold, indicating that the target important features in the landslide grid and the non-landslide grid are very similar, the non-landslide grid is determined to be a high risk unit.
S260, discretizing the current region based on each target important feature to obtain a discretized section corresponding to each target important feature, and determining a high risk section in each discretized section.
The discretization refers to dividing the grids in the current area into at least one discretization interval, and each discretization interval packet may only include a non-landslide grid, or may also include both a landslide grid and a non-landslide grid. Optionally, the method for obtaining the discretized interval corresponding to each target important feature includes: clustering the grids of the current area based on each target important feature to obtain at least one clustering category corresponding to each target important feature; and taking each cluster category as a discretization interval corresponding to each target important feature. Optionally, equidistant binning or equal-frequency binning is performed on the grids of the current area based on each target important feature, so as to obtain a discretization interval corresponding to each target important feature.
Optionally, the determining the high risk interval in each discretized interval includes: calculating the landslide proportion of the landslide grids in the discretization interval based on the quantity of the landslide grids and the non-landslide grids in each discretization interval respectively; calculating the average value of sliding windows of the landslide grids and the non-landslide grids according to the preset sliding window number and the landslide proportion; and determining a high risk interval in each discretization interval based on the sliding window average value and the landslide proportion.
Optionally, the determining a high risk interval in each of the discretized intervals based on the sliding window average and the landslide proportion includes: and if the ratio of the average value of the sliding window to at least one landslide proportion in the sliding window is smaller than a first set threshold value and the at least one landslide proportion in the sliding window is larger than a second set threshold value, taking the discretization interval corresponding to the at least one landslide proportion in the sliding window as a high risk interval.
Wherein the sliding window number refers to the number of landslide ratios during each sliding, and the sliding window average value refers to the average value of landslide ratios in the sliding window under the number. Illustratively, 3 discretized intervals are determined based on the gradient in the target importance feature, landslide windowThe number of sliding window is 2, the number of sliding ratios in the sliding window is 2 during each sliding, the sliding ratio of the first discretization section is 0.06, the sliding ratio of the second discretization section is 0.0002, the sliding ratio of the third discretization section is 0.02, the first setting threshold is 10, the second setting threshold is 0.005, and the sliding window average value of the first discretization section and the second discretization section is 0.0002 The sliding window mean value of the second discretization interval and the third discretization interval is +.>The sliding window mean value of the second discretization interval and the third discretization interval is +.>It can be seen that x 1 Ratio to landslide ratio of the first discretized zone, x 3 Ratio to landslide ratio of the first discretized zone, and x 3 The ratio of the landslide ratio to the third discretization interval is smaller than 10, and the landslide ratio of the first discretization interval and the landslide ratio of the third discretization interval are both larger than 0.005, and the first discretization interval and the third discretization interval are taken as high-risk intervals. By means of the sliding window average value and the landslide proportion, the high risk interval can be accurately determined.
S270, determining high-risk units in the non-landslide grids according to the number of target important features corresponding to each non-landslide grid in the high-risk interval.
Optionally, comparing the number of the target important features corresponding to each non-landslide grid in the high-risk interval with a third set threshold, if the number is greater than the third set threshold, indicating that the number of the target important features corresponding to each non-landslide sample in the high-risk interval is greater, and taking the non-landslide grids in the high-risk interval as the high-risk units.
Further, an area surrounded by the high risk units determined based on S250 may be regarded as a high risk area, or an area surrounded by the high risk sections determined based on S260 to S270 may be regarded as a high risk area, or an area shared by an area surrounded by the high risk units determined based on S250 and an area surrounded by the high risk sections determined based on S260 to S270 may be regarded as a high risk area.
According to the technical scheme provided by the embodiment, the similarity between the landslide grids and the non-landslide grids is calculated based on the target important features, and high risk units in the non-landslide grids are determined based on the similarity, or discretization processing is carried out on the current area based on each target important feature to obtain a discretization interval corresponding to each target important feature, and the high risk interval in each discretization interval is determined; and determining the high-risk units in the non-landslide grids according to the number of the target important features corresponding to each non-landslide grid in the high-risk interval. The method can realize that the high risk unit is determined accurately by analyzing the important characteristics of the target in a targeted manner, and the high risk area is determined, so that the landslide occurrence condition is analyzed in a targeted manner according to the high risk area, and the landslide prediction efficiency and the landslide prediction precision are improved.
Example III
Fig. 3 is a schematic diagram of the result of a feature screening apparatus according to a third embodiment of the present invention, as shown in fig. 3, where the feature screening apparatus includes: a data acquisition module 310, a landslide impact factor determination module 320, a first bias factor determination module 330, a second bias factor determination module 340, and a target important feature determination module 350.
The data acquisition module 310 is configured to acquire a landslide grid of a current area and a non-landslide grid other than the landslide grid;
a landslide impact factor determination module 320 for determining landslide impact factors of the landslide grid and the non-landslide grid;
a first bias coefficient determining module 330, configured to screen a preliminary important feature set from the landslide impact factors, and determine first bias coefficients of the landslide grid and the non-landslide grid under the preliminary important feature set;
a second bias coefficient determining module 340, configured to determine important features to be supplemented in the landslide impact factor, update the preliminary important feature set based on the important features to be supplemented, and determine second bias coefficients of the landslide grid and the non-landslide grid under the updated preliminary important features;
A target important feature determining module 350, configured to determine a target important feature of the current region based on the first bias coefficient, the second bias coefficient, and the preliminary important feature set.
According to the technical scheme, the landslide influence factors of the landslide grids and the non-landslide grids in the current area are determined, the preliminary important feature set is screened out from the landslide influence factors, the important features to be supplemented are determined, the first bias coefficient and the second bias coefficient can be sequentially determined, whether the important features to be supplemented are added in the preliminary important feature set or not is determined according to the first bias coefficient and the second bias coefficient, so that important features are sequentially determined, screening is achieved, landslide occurrence conditions are analyzed according to the target important features in a targeted mode, and the prediction efficiency and the prediction accuracy of landslide are improved.
Optionally, the first bias coefficient determining module 330 is further configured to calculate a first distribution distance of the landslide grid and the non-landslide grid based on the landslide impact factor;
and screening important features from the landslide influence factors according to the first distribution distance to construct an important feature set, and screening a preliminary important feature set with preset proportion from the important feature set.
Optionally, the first bias coefficient determining module 330 is further configured to compare the first distribution distance with a predetermined score, screen out landslide impact factors corresponding to the first distribution distance exceeding the score from landslide impact factors as important features, and construct an important feature set.
Optionally, the first bias coefficient determining module 330 is further configured to cluster the landslide grid and the non-landslide grid based on the preliminary important feature set, and determine a distribution state of the grids in each cluster category;
and if the distribution states of the grids in each cluster category do not accord with normal distribution, calculating a first skewness coefficient of the landslide grids and the non-landslide grids under the preliminary important feature set.
Optionally, the second bias coefficient determining module 340 is further configured to calculate a second distribution distance of the landslide grid and the non-landslide grid based on the important feature to be supplemented;
and determining important features to be supplemented in the landslide impact factors according to the order of the second distribution distances from large to small.
Optionally, the target important feature determining module 350 is further configured to, if the second bias coefficient is less than or equal to the first bias coefficient, take the preliminary important feature before updating as the target important feature;
And if the second bias state coefficient is larger than the first bias state coefficient, taking the updated preliminary important features as the target important features.
Optionally, the apparatus further comprises: a high risk unit;
the high risk unit is used for calculating the similarity between the landslide grid and the non-landslide grid based on the target important characteristics, and determining the high risk unit in the non-landslide grid based on the similarity.
Optionally, the high risk unit is further configured to discretize the current area based on each target important feature, obtain a discretized interval corresponding to each target important feature, and determine a high risk interval in each discretized interval;
and determining the high-risk units in the non-landslide grids according to the number of the target important features corresponding to each non-landslide grid in the high-risk interval.
Optionally, the high risk unit is further configured to calculate a landslide proportion occupied by the landslide grid in the discretization interval based on the number of the landslide grids and the non-landslide grids in each discretization interval respectively;
Calculating the average value of sliding windows of the landslide grids and the non-landslide grids according to the preset sliding window number and the landslide proportion;
and determining a high risk interval in each discretization interval based on the sliding window average value and the landslide proportion.
The feature screening device provided by the embodiment of the invention can execute the feature screening method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 is a schematic structural diagram of a feature screening apparatus according to a fourth embodiment of the present invention. Fig. 4 shows a block diagram of an exemplary feature screening apparatus 12 suitable for use in implementing embodiments of the present invention. The feature screening apparatus 12 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 4, feature screening device 12 is in the form of a general purpose computing device. The components of feature screening apparatus 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Feature screening apparatus 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by feature screening device 12 and includes both volatile and non-volatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Feature screening device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set of program modules (e.g., a data acquisition module 310, a landslide impact factor determination module 320, a first bias factor determination module 330, a second bias factor determination module 340, and a target important feature determination module 350 of a feature screening device) configured to perform the functions of embodiments of the present invention.
The program/utility 44 having a set of program modules 46 (e.g., the data acquisition module 310, the landslide impact factor determination module 320, the first bias factor determination module 330, the second bias factor determination module 340, and the target important feature determination module 350) may be stored in, for example, the system memory 28, such program modules 46 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 46 generally perform the functions and/or methods of the embodiments described herein.
Feature screening device 12 may also be in communication with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with feature screening device 12, and/or any device (e.g., network card, modem, etc.) that enables feature screening device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, feature screening device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet via network adapter 20. As shown, network adapter 20 communicates with other modules of feature screening apparatus 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with feature screening apparatus 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, to implement a feature screening method provided by an embodiment of the present invention, the method comprising:
acquiring landslide grids of a current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids;
screening a preliminary important feature set from the landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set;
and determining the target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a rainfall peak type dividing method provided by an embodiment of the present invention.
Of course, those skilled in the art will understand that the processor may also implement the technical solution of a feature screening method provided in any embodiment of the present invention.
Example five
The fifth embodiment of the present invention further provides a computer readable storage medium having a computer program stored thereon, the program when executed by a processor implementing a feature screening method as provided by the embodiments of the present invention, the method comprising:
acquiring landslide grids of a current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids;
screening a preliminary important feature set from the landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set;
and determining the target important features of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set.
Of course, the computer readable storage medium provided in the embodiments of the present invention, on which the computer program stored, is not limited to the above method operations, but may also perform the related operations in a feature screening method provided in any embodiment of the present invention.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device.
The computer readable signal medium may include a first bias factor, a second bias factor, a target important feature, etc., in which the computer readable program code is embodied. Such propagated first bias factor, second bias factor, and target important features, etc. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It should be noted that, in the embodiment of the feature screening apparatus, each module included is only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (7)

1. A method of feature screening comprising:
acquiring landslide grids of a current area and non-landslide grids except the landslide grids, and determining landslide influence factors of the landslide grids and the non-landslide grids;
Screening a preliminary important feature set from the landslide influence factors, and determining first bias coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set; wherein the important feature to be supplemented is the landslide impact factor not in the preliminary important feature set; determining a target important feature of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set;
the screening of the preliminary important feature set from the landslide impact factors comprises the following steps:
calculating a first distribution distance of the landslide grids and the non-landslide grids based on the landslide impact factor;
screening important features from the landslide impact factors according to the first distribution distance to construct an important feature set, and screening a preliminary important feature set with preset proportion from the important feature set;
The determining important features to be supplemented in the landslide impact factor comprises:
calculating a second distribution distance of the landslide grids and the non-landslide grids based on the important features to be supplemented;
determining important features to be supplemented in the landslide impact factors according to the sequence from the large distribution distance to the small distribution distance;
after determining important features to be supplemented, adding the important features to be supplemented into the preliminary important feature set one by one according to the sequence of the second distribution distance from large to small so as to update the preliminary important feature set, and determining the second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set;
the determining the target important feature of the current region based on the first bias state coefficient, the second bias state coefficient and the preliminary important feature set includes:
if the second bias state coefficient is smaller than or equal to the first bias state coefficient, taking the preliminary important feature before updating as the target important feature;
and if the second bias state coefficient is larger than the first bias state coefficient, taking the updated preliminary important features as the target important features.
2. The method of claim 1, wherein the screening important features from the landslide impact factor based on the first distribution distance to construct an important feature set comprises:
comparing the first distribution distance with a preset score, screening landslide influence factors corresponding to the first distribution distance exceeding the score from landslide influence factors to serve as important features, and constructing an important feature set.
3. The method of claim 1, wherein the determining a first bias factor for the landslide grid and the non-landslide grid under the preliminary set of important features comprises:
clustering the landslide grids and the non-landslide grids based on the preliminary important feature set, and determining the distribution state of the grids in each clustering category;
and if the distribution states of the grids in each cluster category do not accord with normal distribution, calculating a first skewness coefficient of the landslide grids and the non-landslide grids under the preliminary important feature set.
4. The method as recited in claim 1, further comprising:
and calculating the similarity between the landslide grids and the non-landslide grids based on the target important features, and determining high-risk units in the non-landslide grids based on the similarity.
5. The method as recited in claim 1, further comprising:
discretizing the current region based on each target important feature to obtain a discretized interval corresponding to each target important feature, and determining a high risk interval in each discretized interval;
and determining the high-risk units in the non-landslide grids according to the number of the target important features corresponding to each non-landslide grid in the high-risk interval.
6. The method of claim 5, wherein the separately determining the high risk interval in each of the discretized intervals comprises:
calculating the landslide proportion of the landslide grids in the discretization interval based on the quantity of the landslide grids and the non-landslide grids in each discretization interval respectively;
calculating the average value of sliding windows of the landslide grids and the non-landslide grids according to the preset sliding window number and the landslide proportion;
and determining a high risk interval in each discretization interval based on the sliding window average value and the landslide proportion.
7. A feature screening apparatus, comprising:
The data acquisition module is used for acquiring landslide grids of the current area and non-landslide grids except the landslide grids;
the landslide influence factor determining module is used for determining landslide influence factors of the landslide grids and the non-landslide grids;
the first deviation coefficient determining module is used for screening a preliminary important feature set from the landslide influence factors and determining first deviation coefficients of the landslide grids and the non-landslide grids under the preliminary important feature set;
the second bias coefficient determining module is used for determining important features to be supplemented in the landslide influence factors, updating the preliminary important feature set based on the important features to be supplemented, and determining second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important features; wherein the important feature to be supplemented is the landslide impact factor not in the preliminary important feature set;
a target important feature determining module, configured to determine a target important feature of the current area based on the first bias coefficient, the second bias coefficient, and the preliminary important feature set;
the first skewness coefficient determination module is further used for calculating a first distribution distance of the landslide grids and the non-landslide grids based on the landslide influence factors;
Screening important features from the landslide impact factors according to the first distribution distance to construct an important feature set, and screening a preliminary important feature set with preset proportion from the important feature set;
the second bias coefficient determining module is further used for calculating a second distribution distance of the landslide grid and the non-landslide grid based on the important features to be supplemented;
determining important features to be supplemented in the landslide impact factors according to the sequence from the large distribution distance to the small distribution distance;
after determining important features to be supplemented, adding the important features to be supplemented into the preliminary important feature set one by one according to the sequence of the second distribution distance from large to small so as to update the preliminary important feature set, and determining the second bias coefficients of the landslide grids and the non-landslide grids under the updated preliminary important feature set;
the target important feature determining module is further configured to take the preliminary important feature before updating as the target important feature if the second bias coefficient is less than or equal to the first bias coefficient; and if the second bias state coefficient is larger than the first bias state coefficient, taking the updated preliminary important features as the target important features.
CN202011501528.9A 2020-12-17 2020-12-17 Feature screening method and device Active CN112580871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011501528.9A CN112580871B (en) 2020-12-17 2020-12-17 Feature screening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011501528.9A CN112580871B (en) 2020-12-17 2020-12-17 Feature screening method and device

Publications (2)

Publication Number Publication Date
CN112580871A CN112580871A (en) 2021-03-30
CN112580871B true CN112580871B (en) 2024-01-02

Family

ID=75136108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011501528.9A Active CN112580871B (en) 2020-12-17 2020-12-17 Feature screening method and device

Country Status (1)

Country Link
CN (1) CN112580871B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902123B (en) * 2021-09-08 2024-09-03 北京淇瑀信息科技有限公司 Method and device for improving service processing capacity of service module and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529197A (en) * 2016-12-07 2017-03-22 中国地质大学(武汉) Landslide stability time-varying law analysis method
CN107092653A (en) * 2017-03-15 2017-08-25 西安工程大学 A kind of landslide Critical Rainfall Threshold based on method of fuzzy cluster analysis
KR101810336B1 (en) * 2016-11-11 2017-12-18 서울대학교산학협력단 Server and method for landslide forecast and alarm, method for receiving landslide forecast and alarm
CN111417132A (en) * 2019-01-07 2020-07-14 中国移动通信有限公司研究院 Cell division method, device and equipment
CN111784044A (en) * 2020-06-29 2020-10-16 杭州鲁尔物联科技有限公司 Landslide prediction method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101810336B1 (en) * 2016-11-11 2017-12-18 서울대학교산학협력단 Server and method for landslide forecast and alarm, method for receiving landslide forecast and alarm
CN106529197A (en) * 2016-12-07 2017-03-22 中国地质大学(武汉) Landslide stability time-varying law analysis method
CN107092653A (en) * 2017-03-15 2017-08-25 西安工程大学 A kind of landslide Critical Rainfall Threshold based on method of fuzzy cluster analysis
CN111417132A (en) * 2019-01-07 2020-07-14 中国移动通信有限公司研究院 Cell division method, device and equipment
CN111784044A (en) * 2020-06-29 2020-10-16 杭州鲁尔物联科技有限公司 Landslide prediction method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多特征面向对象区域滑坡现象识别;丁辉等;遥感技术与应用;第28卷(第6期);1107-1113 *

Also Published As

Publication number Publication date
CN112580871A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
Senaviratna et al. Diagnosing multicollinearity of logistic regression model
CN110992169B (en) Risk assessment method, risk assessment device, server and storage medium
CN111291070B (en) Abnormal SQL detection method, equipment and medium
CN111369056B (en) Geological disaster prediction method and electronic equipment
CN112364637B (en) Sensitive word detection method and device, electronic equipment and storage medium
US10628433B2 (en) Low memory sampling-based estimation of distinct elements and deduplication
CN108256020B (en) Abnormal route detection method, abnormal route detection device, server and storage medium
CN112580871B (en) Feature screening method and device
CN113159934A (en) Method and system for predicting passenger flow of network, electronic equipment and storage medium
CN112668238A (en) Rainfall processing method, device, equipment and storage medium
CN117934154A (en) Transaction risk prediction method, model training method, device, equipment, medium and program product
Bodik et al. Causality in extremes of time series
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
Shin et al. Statistical evaluation of different sample sizes for local calibration process in the highway safety manual
CN112651172A (en) Rainfall peak type dividing method, device, equipment and storage medium
CN112347776B (en) Medical data processing method and device, storage medium and electronic equipment
CN109784048B (en) Method for detecting overflow vulnerability of stack buffer based on program diagram
CN112529315B (en) Landslide prediction method, landslide prediction device, landslide prediction equipment and storage medium
CN112560267B (en) Method, device, equipment and storage medium for dividing ramp units
CN112561171B (en) Landslide prediction method, device, equipment and storage medium
Batterton et al. Confidence intervals around Bayes Cost in multi‐state diagnostic settings to estimate optimal performance
CN115934852A (en) Tax registration address space-time clustering method, device, server and storage medium
CN115643094A (en) Threat information fusion method and device, electronic equipment and storage medium
CN116245630A (en) Anti-fraud detection method and device, electronic equipment and medium
CN112860824B (en) Scale adaptability evaluation method for high-resolution DEM terrain feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant