CN115907226A - Characteristic data processing method, device, equipment and storage medium - Google Patents

Characteristic data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115907226A
CN115907226A CN202211711350.XA CN202211711350A CN115907226A CN 115907226 A CN115907226 A CN 115907226A CN 202211711350 A CN202211711350 A CN 202211711350A CN 115907226 A CN115907226 A CN 115907226A
Authority
CN
China
Prior art keywords
data
fire point
characteristic data
fire
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211711350.XA
Other languages
Chinese (zh)
Inventor
张可颖
刘岚
吴新桥
覃平
赵继光
王昊
詹谭博驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Power Grid Digital Grid Research Institute Co Ltd
Original Assignee
Southern Power Grid Digital Grid Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Power Grid Digital Grid Research Institute Co Ltd filed Critical Southern Power Grid Digital Grid Research Institute Co Ltd
Priority to CN202211711350.XA priority Critical patent/CN115907226A/en
Publication of CN115907226A publication Critical patent/CN115907226A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for processing feature data, wherein the method comprises the following steps: performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using a radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space; standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data; determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data; training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks. The accuracy of mountain fire risk prediction can be improved, and meanwhile mountain fire risk prediction efficiency is improved.

Description

Characteristic data processing method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of computers, in particular to a feature data processing method, a feature data processing device, feature data processing equipment and a storage medium.
Background
The mountain fire disaster easily causes the tripping of an overhead transmission line, and influences the stable operation of a power grid. Under the influence of human activities such as burning fields or ancestor worship, overhead transmission lines in partial areas are easy to have large-scale fire disasters, so that the tripping of the transmission lines is caused, and even important cross spanning, dense channels and main channels for western-electric-eastern transmission are threatened in severe cases, so that the bad influence is caused on the safe and stable operation of a power grid. Tripping of overhead transmission lines caused by fire accounts for a considerable proportion of all types of tripping and power failure accidents. In order to improve the prevention and control level of forest fire disasters of overhead transmission lines, domestic and foreign scholars have developed multi-dimensional researches including forest fire distribution rules, trip mechanisms, monitoring alarms, risk assessment and the like, but the accuracy and the efficiency of forest fire risk prediction are low. Therefore, how to realize accurate prediction of the risk of the mountain fire and improve the prediction efficiency of the risk of the mountain fire is a problem to be solved.
Disclosure of Invention
The invention provides a characteristic data processing method, a characteristic data processing device and a storage medium, which improve accuracy of mountain fire risk prediction and efficiency of mountain fire risk prediction by performing data processing on fire point characteristic data.
According to an aspect of the present invention, there is provided a feature data processing method including:
performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using a radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space; the original fire point characteristic data comprises human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data;
standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data;
determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data;
training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
According to another aspect of the present invention, there is provided a feature data processing apparatus including:
the high latitude characteristic data determining module is used for performing secondary derivation on the original fire point characteristic data of the fire point sample book, performing data processing on the derived original fire point characteristic data by using the radial basis function, and determining the high latitude characteristic data of the original fire point characteristic data in a high latitude space; the original fire point characteristic data comprises human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data;
the data standardization processing module is used for carrying out standardization processing on the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data;
the target data acquisition module is used for determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data;
the model training module is used for training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the machine learning model after training as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of processing characteristic data according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the feature data processing method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, the original fire point characteristic data of a fire point sample book is subjected to secondary derivation, and the derived original fire point characteristic data is subjected to data processing by utilizing a radial basis function, so that high latitude characteristic data of the original fire point characteristic data in a high latitude space is determined; standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data; determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data; training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks. According to the scheme, the problems that when the mountain fire risk is predicted, influence factors of mountain fire disasters are not comprehensively considered, and the mountain fire risk is difficult to be predicted comprehensively and objectively due to the fact that subjective experience of experts is required, so that the mountain fire risk prediction accuracy is low, and the mountain fire risk prediction efficiency is low are solved. When the mountain fire risk is predicted, the influence of human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data on mountain fire wind disasters is fully considered, meanwhile, data processing is carried out on the fire point characteristic data, standard fire point characteristic data is determined, data screening is carried out on the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data, target fire point characteristic data used for training a machine learning model is determined, the machine learning model is trained by adopting the target fire point characteristic data, a mountain fire risk prediction model used for predicting the mountain fire risk is obtained, the training efficiency of the machine learning model can be improved, the labor cost for predicting the mountain fire risk is saved, and meanwhile, the model accuracy of the mountain fire risk prediction model is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a feature data processing method according to an embodiment of the present invention;
fig. 2 is a flowchart of a feature data processing method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a feature data processing method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a feature data processing apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "candidate" and "target" and the like in the description and claims of the present invention and the above drawings are used for distinguishing similar objects and are not necessarily used for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a feature data processing method according to an embodiment of the present invention, which is applicable to a case of processing fire point feature data, and is particularly applicable to a case of processing fire point feature data, and training a machine learning model according to the processed fire point feature data to obtain a mountain fire risk prediction model for predicting mountain fire risk. The method may be performed by a feature data processing apparatus, which may be implemented in the form of hardware and/or software, which may be configured in an electronic device. As shown in fig. 1, the method includes:
s110, performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using the radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space.
The raw fire characteristic data includes human activity characteristic data, geographic information characteristic data, meteorological characteristic data, and regional historical fire data.
The fire sample refers to a place where mountain fire disasters may happen to the overhead transmission line collected in advance. The fire sample data comprises a forest fire sample and a fireless sample, wherein the forest fire sample refers to a fire sample with a forest fire disaster; the fireless sample refers to a fire point sample without a mountain fire disaster.
For example, the fire sample notebook may be obtained by sampling fire point data captured by satellites in 2015-2021. The secondary derivation means deep processing is carried out on the primary term derivation, and the primary derivation means the original fire point characteristic data of the obtained fire point sample.
It should be noted that the cause of the mountain fire disaster of the overhead transmission line is complex, often is the result of the synergistic effect of multiple factors, and can be mainly divided into human factors and natural factors, wherein the proportion of the fire caused by the human factors is as high as more than 90%. The human factors comprise a traffic network near a line, residential points, regional population density, GDP and the like; the natural factors mainly comprise geographic information characteristics such as elevation, gradient, slope direction, historical fire point density, vegetation type, normalized vegetation index NDVI, climate characteristic temperature and the like.
Specifically, based on the raster data and the national road network vector data, the human activity characteristic data, the geographic information characteristic data, the meteorological characteristic data and the regional historical fire point data of the fire point sample are extracted, and the extracted human activity characteristic data, the geographic information characteristic data, the meteorological characteristic data and the regional historical fire point data of the fire point sample are used as the original fire point characteristic data. Adjusting the meteorological feature data according to the time sequence lag feature and the smooth feature of the meteorological feature data; and constructing a sample feature set according to the human activity feature data, the geographic information feature data, the regional historical fire point data and the adjusted meteorological feature data. Wherein the geographic information characteristic data comprises: elevation, slope and combustible risk grade of the fire point sample; the human activity characteristic data includes: population density and regional national production total value of fire point samples; the meteorological feature data include: temperature, hourly rainfall, wind speed and relative humidity of the fire point sample; the historical fire point data includes: euclidean distances of the fire to railways, highways, first-class highways, and nearest residents. And performing secondary derivation on the original fire point characteristic data of the fire point sample, wherein the secondary derivation refers to deep processing on nonlinear characteristic data in the original fire point characteristic data so as to facilitate extraction of the nonlinear characteristic data in the deep processed original fire point characteristic data. And performing data processing on the derived original fire point characteristic data by using the radial basis function, extracting the characteristic data of the original fire point characteristic data in a high-latitude space, and taking the characteristic data of the original fire point characteristic data in the high-latitude space as the high-latitude characteristic data.
And S120, standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data, and determining the standard fire point characteristic data.
Specifically, because dimensions of different high-latitude characteristic data are different, in order to ensure reliability of data processing on the high-latitude characteristic data and reliability of the mountain fire risk prediction model, the high-latitude characteristic data needs to be subjected to standardized processing. And standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data, and taking the standardized high weft characteristic data as standard fire point characteristic data.
And S130, determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data.
The data contribution degree refers to data representing the influence degree of the standard fire point characteristic data on the fire point sample on the mountain fire disaster phenomenon of the overhead transmission line.
Specifically, a characteristic screening algorithm is adopted to calculate the data contribution degree of the standard fire point characteristic data. The feature screening algorithm may be a Filter algorithm or a Wrapper algorithm. And screening the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data, and determining the target fire point characteristic data from the standard fire point characteristic data.
For example, the data contribution degree of the standard fire point characteristic data may be calculated based on a filtering algorithm or a Wrapper algorithm, and data screening may be performed on the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data, so as to determine one hundred standard fire point characteristic data with the highest data contribution degree as the target fire point characteristic data.
S140, training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
The mountain fire risk refers to the mountain fire risk of the overhead transmission line. The machine learning model may be an XGBoost learning framework.
The XGboost learning frame is an integrated machine learning algorithm based on a decision tree, and a gradient lifting frame is used. In prediction problems involving unstructured data (images, text, etc.), artificial neural networks tend to perform better than all other algorithms or frameworks. However, tree model based algorithms tend to perform best when dealing with small to medium structured table data. The processing result of the XGboost learning framework on the data is the sum of the processing results of a plurality of decision trees in the XGboost learning framework.
Specifically, a machine learning model is trained according to the fire point samples and the target fire point characteristic data, when the machine learning model is trained, the target fire point characteristic data is used as training data of the machine learning model, and whether mountain fire disasters happen or not is used as supervision data of the machine learning model. And determining a forest fire risk prediction model according to the training result of the machine learning model. The mountain fire risk prediction model is used for predicting mountain fire risks of the overhead transmission line.
According to the technical scheme provided by the embodiment, the original fire point characteristic data of the fire point sample book is subjected to secondary derivation, the derived original fire point characteristic data is subjected to data processing by utilizing a radial basis function, and high latitude characteristic data of the original fire point characteristic data in a high latitude space is determined; standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data; determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data; training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks. According to the scheme, the problems that when the mountain fire risk is predicted, influence factors of mountain fire disasters are not comprehensively considered, and the mountain fire risk is difficult to be predicted comprehensively and objectively due to the fact that subjective experience of experts is required, so that the mountain fire risk prediction accuracy is low, and the mountain fire risk prediction efficiency is low are solved. When the mountain fire risk is predicted, the influence of human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data on mountain fire wind disasters is fully considered, meanwhile, data processing is carried out on the fire point characteristic data, standard fire point characteristic data is determined, data screening is carried out on the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data, target fire point characteristic data used for training a machine learning model is determined, the machine learning model is trained by adopting the target fire point characteristic data, a mountain fire risk prediction model used for predicting the mountain fire risk is obtained, the training efficiency of the machine learning model can be improved, the labor cost for predicting the mountain fire risk is saved, and the model accuracy of the mountain fire risk prediction model is improved.
Example two
Fig. 2 is a flowchart of a feature data processing method according to a second embodiment of the present invention, which is optimized based on the above-described second embodiment, and provides a preferred implementation manner of determining target fire point feature data from standard fire point feature data according to a data contribution degree of the standard fire point feature data. Specifically, as shown in fig. 2, the method includes:
s210, performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using the radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space.
The raw fire characteristic data includes human activity characteristic data, geographic information characteristic data, meteorological characteristic data, and regional historical fire data.
S220, standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data, and determining the standard fire point characteristic data.
And S230, constructing a sample feature set according to the fire point sample and the standard fire point feature data, and dividing the sample feature set into a training feature subset and a testing feature subset.
Specifically, whether a fire disaster occurs in the fire disaster sample is correspondingly stored in the feature set file together with the standard fire disaster feature data, so that the construction of the sample feature set is completed. The feature set file may be a file in tabular form. The sample feature set may be equally divided into five, each sample feature set being a candidate feature subset, four of the candidate feature subsets being training feature subsets, and one of the candidate feature subsets other than the training feature subsets being a test feature subset.
Illustratively, the sample feature set may be divided into a training feature subset and a testing feature subset by the following sub-steps:
s2301, dividing the sample feature set into a candidate training subset and a candidate testing subset, and determining an evaluation function of the candidate training subset through a classifier.
Wherein, the classifier may be a classifier in the XGBoost learning framework. The evaluation function of the candidate training subset refers to data for measuring whether the candidate feature subset can be used as the training feature subset. I.e. the merit function is data used to measure the reliability of the candidate training subsets. The candidate training subsets refer to data sets that are initially selected from the candidate feature subsets as training feature subsets.
Specifically, the sample feature set is divided into a candidate training subset and a candidate testing subset according to a preset sample feature subset division rule. And determining the evaluation function of the candidate training subset through the classifier. The evaluation function of the candidate training subset may be the number of times the standard fire feature data in the candidate training subset is selected as the split feature, the total information gain, the average information gain, or the average coverage of the features. The candidate training subset refers to a data set that is initially selected from the candidate feature subset as the test feature subset.
For example, the sample feature set may be divided into five parts on average, each sample feature set serves as a candidate feature subset, four candidate feature subsets serve as candidate training subsets, and one candidate feature subset other than the training feature subset serves as a candidate testing subset.
S2302, adjusting the candidate training subset and the candidate testing subset according to the evaluation function, and determining the training feature subset and the testing feature subset.
Specifically, if the evaluation function of the candidate training subset does not meet the preset function condition, the candidate training subset corresponding to the evaluation function that does not meet the function condition is adjusted to the test feature subset, and the evaluation function of the candidate test subset is determined, and if the evaluation function of the candidate test subset meets the preset function condition, the candidate test subset is adjusted to the training feature subset, and meanwhile, the candidate training subset corresponding to the evaluation function that meets the preset function condition is used as the training feature subset.
It can be understood that the candidate training subset and the candidate testing subset are adjusted according to the evaluation function of the candidate training subset, and the training feature subset and the testing feature subset are determined, so that the reliability of target fire point feature data for training a machine learning model can be improved, and the model accuracy of the mountain fire risk prediction model is improved.
S240, determining the data contribution degree of the standard fire point feature data in the training feature subset through a Wrapper algorithm.
Optionally, the Wrapper algorithm may further perform training evaluation on the training feature subset, and measure the reliability of the training feature subset according to the training accuracy, so as to select the training feature subset with high reliability.
For example, the method for determining the data contribution of the standard fire point feature data in the training feature subset may be: determining the feature importance of each standard fire point feature data in the training feature subset through a Wrapper algorithm; and determining an importance average value of the feature importance of the standard fire point feature data, and taking the importance average value as the data contribution of the standard fire point feature data in the training feature subset.
Specifically, the characteristic importance of each standard fire point characteristic data in the training characteristic subset is determined through the Wrapper algorithm, the characteristic importance of each standard fire point characteristic data is added and then an average value is calculated, the importance average value of the characteristic importance is determined, and the importance average value is used as the data contribution of the standard fire point characteristic data in the training characteristic subset.
It can be understood that the reliability of the data contribution degree can be improved by using the importance degree average value as the data contribution degree of the standard fire point feature data in the training feature subset.
And S250, carrying out contribution degree sequencing on the standard fire point characteristic data according to the data contribution degree, and determining target fire point characteristic data from the standard fire point characteristic data according to a sequencing result.
Specifically, the standard fire point characteristic data are sorted according to the data contribution degree, the standard fire point characteristic data with the highest data contribution degree are arranged at the first position, and the standard fire point characteristic data with the lowest data contribution degree are arranged at the last position. And determining the standard fire point characteristic data with higher data contribution degree as the target fire point characteristic data according to the sequencing result. For example, the standard fire point characteristic data ranked in the first one hundred digits can be determined as the target fire point characteristic data according to the ranking result.
S260, training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the machine learning model after training as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
The technical scheme of the embodiment provides a preferred real-time scheme for determining target fire point characteristic data from standard fire point characteristic data, and when the target fire point characteristic data is determined, a sample characteristic set is constructed according to a fire point sample and the standard fire point characteristic data, and the sample characteristic set is divided into a training characteristic subset and a testing characteristic subset; and performing contribution degree sequencing on the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data in the training characteristic subset, and determining target fire point characteristic data from the standard fire point characteristic data according to a sequencing result. According to the scheme, the target fire point characteristic data with high data contribution degree can be screened out and used for training the machine learning model, so that the mountain fire risk prediction model with high model accuracy is obtained, the model accuracy of the mountain fire risk prediction model is improved, meanwhile, the training data required by the machine learning model is reduced, the training efficiency of the machine learning model is improved, and meanwhile, the determination efficiency of the target fire point characteristic data is improved.
EXAMPLE III
Fig. 3 is a flowchart of a feature data processing method provided in the third embodiment of the present invention, and this embodiment optimizes the above embodiments, and provides a preferred embodiment of training a machine learning model according to a fire sample and target fire feature data, and using the trained machine learning model as a forest fire risk prediction model. Specifically, as shown in fig. 3, the method includes:
s310, performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using the radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space.
The raw fire characteristic data includes human activity characteristic data, geographic information characteristic data, meteorological characteristic data, and regional historical fire data.
And S320, standardizing the high latitude characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data, and determining the standard fire point characteristic data.
And S330, determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data.
And S340, training a machine learning model according to the fire point sample data in the training feature subset and the target fire point feature data, and determining a candidate prediction model.
Specifically, the fire point samples in the training feature subset and the target fire point feature data are adopted to train a machine learning model, when the machine learning model is trained, the target fire point feature data are used as training data of the machine learning model, and whether mountain fire disasters occur in the fire point samples or not is used as supervision data of the machine learning model. And taking the trained machine learning model as a candidate prediction model.
For example, the method for determining the candidate prediction model may be: training a machine learning model according to fire point sample data and target fire point feature data in the training feature subset, searching a hyper-parameter space based on a Bayesian algorithm, and determining local optimal parameters of the machine learning model; and adjusting model parameters of the machine learning model according to the local optimal parameters, and determining a candidate prediction model.
It can be understood that the Bayesian optimization algorithm adopts a Gaussian process, parameter information searched before can be considered, iteration times are few, convergence speed is high, and local optimal parameters of the machine learning model can be quickly searched.
S350, testing the candidate prediction model according to the fire point sample data and the target fire point characteristic data in the test characteristic subset, adjusting model parameters of the candidate prediction model according to a test result, and taking the adjusted candidate prediction model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
Specifically, the candidate prediction model can be tested by a cross-validation method according to the fire point sample data and the target fire point feature data in the test feature subset. The cross-validation method is a method of establishing a model by adopting most sample data in a given modeling sample, reserving a small part of sample to be tested by using the just established model, solving the test error of the small part of sample, and recording the sum of squares of the test errors. If the test result shows that the model accuracy of the candidate prediction model does not meet the requirement, adjusting the model parameters of the candidate prediction model, testing the adjusted candidate prediction model again, and if the test result of the adjusted candidate prediction model shows that the model accuracy meets the requirement, taking the adjusted candidate prediction model as a forest fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
The technical scheme of the embodiment provides a preferred implementation scheme for training a machine learning model according to fire point samples and target fire point characteristic data to obtain a mountain fire risk prediction model. And training a machine learning model by adopting the fire sample data and the target fire characteristic data in the training characteristic subset, determining a candidate prediction model, testing the candidate prediction model according to the fire sample data and the target fire characteristic data in the testing characteristic subset, and adjusting model parameters of the candidate prediction model according to a test result to obtain the forest fire risk prediction model. The accuracy of the model of the mountain fire risk prediction model can be improved.
Example four
Fig. 4 is a schematic structural diagram of a feature data processing apparatus according to a fourth embodiment of the present invention. The embodiment is applicable to the case of processing the fire point characteristic data. As shown in fig. 4, the feature data processing apparatus includes: a high latitude characteristic data determination module 410, a data standardization processing module 420, a target data acquisition module 430 and a model training module 440.
The high-latitude characteristic data determining module 410 is configured to perform secondary derivation on the original fire point characteristic data of the fire point sample, perform data processing on the derived original fire point characteristic data by using a radial basis function, and determine high-latitude characteristic data of the original fire point characteristic data in a high-latitude space; the original fire point characteristic data comprises human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data;
the data standardization processing module 420 is used for standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data;
the target data acquisition module 430 is used for determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data;
the model training module 440 is used for training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the machine learning model after training as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
According to the technical scheme provided by the embodiment, the secondary derivation is carried out on the original fire point characteristic data of the fire point sample, the data processing is carried out on the derived original fire point characteristic data by utilizing the radial basis function, and the high latitude characteristic data of the original fire point characteristic data in the high latitude space is determined; standardizing the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data; determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data; training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks. According to the scheme, the problems that when the mountain fire risk is predicted, influence factors of mountain fire disasters are not comprehensively considered, and the mountain fire risk is difficult to be predicted comprehensively and objectively due to the fact that subjective experience of experts is required, so that the mountain fire risk prediction accuracy is low, and the mountain fire risk prediction efficiency is low are solved. When the mountain fire risk is predicted, the influence of human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data on mountain fire wind disasters is fully considered, meanwhile, data processing is carried out on the fire point characteristic data, standard fire point characteristic data is determined, data screening is carried out on the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data, target fire point characteristic data used for training a machine learning model is determined, the machine learning model is trained by adopting the target fire point characteristic data, a mountain fire risk prediction model used for predicting the mountain fire risk is obtained, the training efficiency of the machine learning model can be improved, the labor cost for predicting the mountain fire risk is saved, and the model accuracy of the mountain fire risk prediction model is improved.
Illustratively, the target data acquisition module 430 includes:
the sample characteristic set dividing unit is used for constructing a sample characteristic set according to the fire point sample and the standard fire point characteristic data and dividing the sample characteristic set into a training characteristic subset and a testing characteristic subset;
the data contribution degree determining unit is used for determining the data contribution degree of the standard fire point feature data in the training feature subset through a Wrapper algorithm;
and the target data determining unit is used for sequencing the contribution degrees of the standard fire point characteristic data according to the data contribution degrees and determining the target fire point characteristic data from the standard fire point characteristic data according to the sequencing result.
Illustratively, the data contribution determining unit is specifically configured to:
determining the feature importance of each standard fire point feature data in the training feature subset through a Wrapper algorithm;
and determining an importance average value of the feature importance of the standard fire point feature data, and taking the importance average value as the data contribution of the standard fire point feature data in the training feature subset.
Illustratively, the sample feature set partitioning unit is specifically configured to:
dividing the sample feature set into a candidate training subset and a candidate testing subset, and determining an evaluation function of the candidate training subset through a classifier;
and adjusting the candidate training subset and the candidate testing subset according to the evaluation function, and determining the training feature subset and the testing feature subset.
Illustratively, the model training module 440 includes:
the candidate model determining unit is used for training a machine learning model according to the fire point sample data in the training feature subset and the target fire point feature data to determine a candidate prediction model;
and the model parameter adjusting unit is used for testing the candidate prediction model according to the fire point sample data in the test feature subset and the target fire point feature data, adjusting the model parameters of the candidate prediction model according to the test result, and taking the adjusted candidate prediction model as the mountain fire risk prediction model.
Illustratively, the candidate model determining unit is specifically configured to:
training a machine learning model according to fire point sample data and target fire point feature data in the training feature subset, searching a hyper-parameter space based on a Bayesian algorithm, and determining local optimal parameters of the machine learning model;
and adjusting model parameters of the machine learning model according to the local optimal parameters, and determining a candidate prediction model.
The feature data processing apparatus provided by this embodiment is applicable to the feature data processing method provided by any of the above embodiments, and has corresponding functions and advantageous effects.
EXAMPLE five
FIG. 5 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the feature data processing method.
In some embodiments, the characteristic data processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the above-described feature data processing method may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the characteristic data processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of feature data processing, comprising:
performing secondary derivation on the original fire point characteristic data of the fire point sample, performing data processing on the derived original fire point characteristic data by using a radial basis function, and determining high latitude characteristic data of the original fire point characteristic data in a high latitude space; the original fire point characteristic data comprises human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data;
standardizing the high latitude characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data, and determining standard fire point characteristic data;
determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data;
training a machine learning model according to the fire point samples and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
2. The method of claim 1, wherein determining target fire characterization data from the standard fire characterization data based on the data contribution of the standard fire characterization data comprises:
constructing a sample feature set according to the fire point sample and the standard fire point feature data, and dividing the sample feature set into a training feature subset and a testing feature subset;
determining the data contribution degree of standard fire point characteristic data in the training characteristic subset through a Wrapper algorithm;
and carrying out contribution degree sequencing on the standard fire point characteristic data according to the data contribution degree, and determining target fire point characteristic data from the standard fire point characteristic data according to a sequencing result.
3. The method of claim 2, wherein determining the data contribution of the standard fire point feature data in the training feature subset by the Wrapper algorithm comprises:
determining the feature importance of each standard fire point feature data in the training feature subset through a Wrapper algorithm;
and determining an importance average value of the feature importance of the standard fire point feature data, and taking the importance average value as the data contribution of the standard fire point feature data in the training feature subset.
4. The method of claim 2, wherein dividing the sample feature set into a training feature subset and a testing feature subset comprises:
dividing a sample feature set into a candidate training subset and a candidate testing subset, and determining an evaluation function of the candidate training subset through a classifier;
and adjusting the candidate training subset and the candidate testing subset according to the evaluation function, and determining a training feature subset and a testing feature subset.
5. The method of claim 1, wherein training a machine learning model according to the fire point samples and the target fire point feature data, and using the trained machine learning model as a mountain fire risk prediction model comprises:
training a machine learning model according to the fire point sample data in the training feature subset and the target fire point feature data, and determining a candidate prediction model;
and testing the candidate prediction model according to the fire point sample data and the target fire point characteristic data in the test characteristic subset, adjusting the model parameters of the candidate prediction model according to the test result, and taking the adjusted candidate prediction model as a mountain fire risk prediction model.
6. The method of claim 5, wherein training the machine learning model based on the fire sample data in the subset of training features and the target fire feature data, determining a candidate predictive model comprises:
training a machine learning model according to fire point sample data and target fire point feature data in the training feature subset, searching a hyper-parameter space based on a Bayesian algorithm, and determining local optimal parameters of the machine learning model;
and adjusting model parameters of the machine learning model according to the local optimal parameters, and determining a candidate prediction model.
7. A feature data processing apparatus characterized by comprising:
the high-latitude characteristic data determining module is used for performing secondary derivation on the original fire point characteristic data of the fire point sample book, performing data processing on the derived original fire point characteristic data by using the radial basis function, and determining the high-latitude characteristic data of the original fire point characteristic data in a high-latitude space; the original fire point characteristic data comprises human activity characteristic data, geographic information characteristic data, meteorological characteristic data and regional historical fire point data;
the data standardization processing module is used for carrying out standardization processing on the high weft characteristic data by adopting a normalization method based on the mean value and the standard deviation of the original data to determine standard fire point characteristic data;
the target data acquisition module is used for determining target fire point characteristic data from the standard fire point characteristic data according to the data contribution degree of the standard fire point characteristic data;
the model training module is used for training a machine learning model according to the fire point sample and the target fire point characteristic data, and taking the trained machine learning model as a mountain fire risk prediction model; the mountain fire risk prediction model is used for predicting mountain fire risks.
8. The apparatus of claim 7, wherein the target data acquisition module further comprises:
the sample characteristic set dividing unit is used for constructing a sample characteristic set according to the fire point sample and the standard fire point characteristic data and dividing the sample characteristic set into a training characteristic subset and a testing characteristic subset;
the data contribution degree determining unit is used for determining the data contribution degree of the standard fire point feature data in the training feature subset through a Wrapper algorithm;
and the target data determining unit is used for carrying out contribution degree sequencing on the standard fire point characteristic data according to the data contribution degree and determining target fire point characteristic data from the standard fire point characteristic data according to a sequencing result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of processing feature data of any one of claims 1-6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the feature data processing method of any one of claims 1 to 6 when executed.
CN202211711350.XA 2022-12-29 2022-12-29 Characteristic data processing method, device, equipment and storage medium Pending CN115907226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211711350.XA CN115907226A (en) 2022-12-29 2022-12-29 Characteristic data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211711350.XA CN115907226A (en) 2022-12-29 2022-12-29 Characteristic data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115907226A true CN115907226A (en) 2023-04-04

Family

ID=86478936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211711350.XA Pending CN115907226A (en) 2022-12-29 2022-12-29 Characteristic data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115907226A (en)

Similar Documents

Publication Publication Date Title
CN110149237B (en) Hadoop platform computing node load prediction method
CN109544399B (en) Power transmission equipment state evaluation method and device based on multi-source heterogeneous data
CN111950585A (en) XGboost-based underground comprehensive pipe gallery safety condition assessment method
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN111178633A (en) Method and device for predicting scenic spot passenger flow based on random forest algorithm
CN116021981A (en) Method, device, equipment and storage medium for predicting ice coating faults of power distribution network line
CN110348683A (en) The main genetic analysis method, apparatus equipment of electrical energy power quality disturbance event and storage medium
AU2019100631A4 (en) Self-correcting multi-model numerical rainfall ensemble forecasting method
CN116050599A (en) Line icing fault prediction method, system, storage medium and equipment
CN112949201B (en) Wind speed prediction method and device, electronic equipment and storage medium
CN111427101B (en) Thunderstorm strong wind grading early warning method, system and storage medium
CN112016739B (en) Fault detection method and device, electronic equipment and storage medium
CN115907226A (en) Characteristic data processing method, device, equipment and storage medium
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN115509255A (en) Substation patrol unmanned aerial vehicle airline risk management and control method, device, equipment and storage medium
CN108123436B (en) Voltage out-of-limit prediction model based on principal component analysis and multiple regression algorithm
CN117131947B (en) Overhead transmission line fault prediction method, device, equipment and storage medium
CN104217093A (en) Method and apparatus for identifying root cause of defect using composite defect map
CN115906668A (en) Power transmission line forest fire trip prediction method, device, equipment and storage medium
CN115453661B (en) Weather forecasting method, weather forecasting device, weather forecasting equipment and storage medium
CN115034447A (en) Overhead line power failure risk prediction method and device
CN114595781A (en) Octane value loss prediction method, device, equipment and storage medium
CN116186536A (en) Risk prediction method, risk prediction device, electronic equipment and storage medium
CN117953651A (en) Risk assessment method, device, equipment and medium for low-temperature rain, snow and freezing disasters
CN116341739A (en) Flood loss pre-assessment method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination