CN108022045B

CN108022045B - Distribution estimation method

Info

Publication number: CN108022045B
Application number: CN201711250676.6A
Authority: CN
Inventors: 张忠山; 褚骁庚; 陈英武; 陈宇宁; 吕济民; 陈盈果; 陈成; 王涛; 刘晓路; 邢立宁; 姚锋; 贺仁杰
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-12-01
Filing date: 2017-12-01
Publication date: 2021-05-14
Anticipated expiration: 2037-12-01
Also published as: CN108022045A

Abstract

The invention discloses a distribution estimation method, which comprises the following steps: step1, when imaging the current target, the target recognition star extracts the forward-looking information provided by the target discovery star first; step2, deleting the target with low observation value in the look-ahead information according to three threshold parameters in the target filtering knowledge; step3, calling a plurality of heuristic algorithms in a scheduling algorithm set to generate a plurality of local observation schemes, wherein the Step can be performed in parallel, and the local observation schemes refer to observation schemes generated by the satellite in the current state only according to target information in a look-ahead time window; step4, comprehensively evaluating each local observation scheme according to evaluation parameters in scheme evaluation knowledge and a related evaluation method; step5, selecting the local observation scheme with the highest score and locking the first target in the scheme as the next observation target.

Description

Distribution estimation method

Technical Field

The invention relates to the technical field of satellites, in particular to a distribution estimation method.

Background

The conventional scheduling method of the agile satellite is to set a double-satellite cluster consisting of a low-resolution target discovery satellite (all called as a target discovery satellite) and an agile high-resolution target identification satellite (all called as a target identification satellite), wherein the target discovery satellite flies at the front end of the cluster, a low-resolution camera is used for push-scanning and imaging a large area on two sides of an off-satellite line, and related target information in the imaging area can be extracted in real time. The target identification satellite receives the target information extracted by the target discovery satellite in real time, and the position information, the imaging duration and the observation yield of the target are acquired 100s in advance. When the target recognition star images a certain target, the next target to be observed is quickly decided on the star according to the forward-looking information. The star cluster formed by the two satellites can effectively solve the target scene of sea surface target identification, and a plurality of targets on the sea surface can be identified by using one-time transit opportunity.

However, since only local information of the target distribution can be obtained at each decision of the target recognition star (only target information within the look-ahead time window is obtained, not task distribution information of the entire scene), a problem of "short sight" is easily caused at each decision of the next observation target. Especially when the scene period becomes long and the satellite's fixed storage and power constraints are tight constraints, these resource constraints may not support the target recognition satellite to image too many targets in one orbit (the high-resolution image occupies a large storage space). Such "short-look" decisions may cause the target-identifying satellites to prematurely consume resources on the persistent equivalent, possibly causing the satellites to have to abandon certain high-yield targets at a later stage due to insufficient resources on the persistent equivalent, thereby reducing the global observation yield of the constellation. In order to improve the target identification capability of the star cluster, how to effectively solve the problem of short sight decision of the target identification star becomes a key problem of star cluster operation management.

Disclosure of Invention

It is an object of the present invention to provide a distribution estimation method that overcomes or at least mitigates at least one of the above-mentioned disadvantages of the prior art.

In order to achieve the above object, the present invention provides a distribution estimation method, including:

step1, acquiring auxiliary decision knowledge information by the target identification satellite, and extracting the look-ahead information provided by the target discovery satellite, wherein the look-ahead information comprises target position information, imaging duration and observation income; the auxiliary decision knowledge information comprises target filtering knowledge and scheme evaluation knowledge;

step2, primarily evaluating the observation solid-state cost ratio and the observation electric quantity cost ratio of the target in the forward-looking time window according to a filtering threshold threPDR of the target imaging time length of the observation target, a filtering threshold threPro of the target observation income and a filtering threshold threPDR of the target observation income time length ratio in the target filtering knowledge so as to delete the target with lower observation value in the forward-looking information, wherein the target information mainly comprises target position information, imaging time length and observation income, and the use constraint of the satellite mainly comprises time window constraint, attitude maneuver constraint, solid-state constraint and electric quantity constraint;

step3, calling a plurality of heuristic algorithms in a scheduling algorithm set to generate a plurality of local observation schemes, wherein the local observation schemes can be calculated in parallel, the local observation schemes refer to observation schemes generated by a satellite in the current state only according to target information in a look-ahead time window, the observation schemes are described by sequence solutions, and one scheduling solution generator is utilized to translate one sequence solution into a feasible scheduling solution;

step4, carrying out comprehensive evaluation on each local observation scheme by using an evaluation function of the local observation scheme according to evaluation parameters in the scheme evaluation knowledge and a related evaluation method;

step5, selecting the local observation scheme with the highest score in Step4, and locking the first target in the scheme as the next observation target;

each parameter in the target filtering knowledge and the scheme evaluation knowledge is calculated by adopting a distribution estimation algorithm, wherein the distribution estimation algorithm specifically comprises the following steps:

step71, initializing a population, generating an initial population uniformly distributed in a value domain by using random sampling, and evaluating the adaptive value of each individual;

step72, dividing the niche, dividing the population into a plurality of sub-populations (niches) by adopting a K-means clustering algorithm based on Euclidean distance, wherein the number of the sub-total populations is a function of the iteration times, and the more the iteration times, the more the number of the sub-populations is;

step73, probability distribution estimation, wherein the individuals in each sub-population are subjected to preferential operation, and a dominant individual probability distribution model in each sub-population is established by using the superior individuals in each sub-population under the condition that all variables are mutually independent;

step74, sampling offspring, selecting a certain character population at a certain probability by the algorithm for sampling each time, sampling offspring by the selected neutron population according to the probability distribution model of the selected neutron population, and ending the Step until the number of newly sampled individuals of the algorithm is equal to the size of the current population;

step75, selecting individuals, combining parent individuals and child individuals in each child population, and obtaining a new generation population by adopting a near elite optimization strategy;

step76, local search, optimizing dominant individuals in the population with a certain probability by adopting local search algorithms such as a hill climbing method and the like, and further improving the quality of the solution;

and Step77, judging whether the termination condition of the algorithm is reached or not, if so, returning to the found optimal individual, otherwise, jumping to Step72.

The invention is further expanded on the basis of the application scene of the double-star cluster, and can be competent for finding, namely identifying, reconnaissance tasks of sea moving targets in a larger application scene (an observation orbit) by designing an on-star decision model. The decision model organically combines historical data, ground learning resources and on-satellite computing capacity by using a solution thought used on a ground learning satellite, so that the decision of short-term observation targets at each time can be completed under the condition of considering historical global information, and the use efficiency of a satellite cluster in a long period is improved. The decision-making capability of the target recognition star is improved by analyzing the historical data of the star cluster operation scene and extracting the auxiliary knowledge information of the decision-making on the star, so that the star cluster can obtain a better global observation benefit.

Drawings

Fig. 1 is a flowchart illustrating an embodiment of a scheduling method for agile satellites according to the present invention.

Fig. 2 is a sequence and schedule solution generator.

Fig. 3 is a ranking under different criteria of the target.

Fig. 4 is a schematic diagram of satellite progress ratios.

Fig. 5 is a flow chart of a distribution estimation algorithm.

FIG. 6 is a comparison of observed yields of decision models and online algorithms in a learning group.

FIG. 7 is a comparison of observed yields of decision models versus online algorithms in a test set.

FIG. 8 is an illustration of the impact of different mechanisms on a decision model.

Detailed Description

In the drawings, the same or similar reference numerals are used to denote the same or similar elements or elements having the same or similar functions. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The present invention assumes: the targets found in each track of the star cluster have a certain distribution rule. Different geographic regions have different target generation probabilities, and target distribution rules of different regions can be obtained through data accumulation, wherein the target distribution rules comprise target position information, imaging duration and observation income. Meanwhile, the orbit of the imaging satellite has a regression characteristic, namely after a certain running time, the orbit of the satellite returns to the original passing line again. Therefore, a large amount of information on the distribution of the target in each imaging orbit of the satellite can be acquired through long-term data accumulation. For each imaging orbit of the satellite cluster, knowledge information for assisting autonomous decision-making on the satellite can be extracted from a large amount of historical target distribution data to help the target recognition satellite to solve the problem of too short sight of each decision-making, and the satellite is prevented from consuming certain resources too early.

As shown in fig. 1, the distribution estimation method provided in this embodiment includes:

step1, after the target recognition star acquires the aid decision-making knowledge information, when imaging the current target each time, the target recognition star extracts the look-ahead information provided by the target discovery star, wherein the look-ahead information comprises target position information, imaging duration and observation income; the auxiliary decision knowledge information comprises target filtering knowledge and scheme evaluation knowledge;

step4, comprehensively evaluating each local observation scheme according to evaluation parameters in the scheme evaluation knowledge and a related evaluation method;

and Step5, selecting the local observation scheme with the highest score in Step4, and locking the first target in the scheme as the next observation target.

In Step2, in order to improve the decision efficiency, the target recognition star may use the target filtering knowledge provided by the ground to perform preliminary screening, also called target filtering, on the targets within the look-ahead time window. In order to improve the accuracy of target filtering and avoid mistakenly deleting valuable targets, relevant attributes of target information and satellite scheduling constraints are briefly analyzed.

The target information mainly comprises target position information, imaging duration and observation income. The use constraints of the satellite are mainly time window constraints, attitude maneuver constraints, solid memory constraints and electric quantity constraints. The time sequence constraint composed of the time window constraint and the attitude maneuver constraint is difficult to judge the quality of the target through simple mathematical calculation, so the target filtering knowledge mainly selects the target to be deleted by analyzing the solid storage of the target and the electricity resource consumption cost performance.

When the target identification satellite images a target, the consumption of the satellite on the survival of a certain observed target is in direct proportion to the imaging duration of the target, specifically dur_iCr, wherein dur_iThe imaging duration of the target, cr, is the image acquisition code rate of the satellite. Since cr is a constant, p can be used_i/dur_i(p_iIs an observation gain of a task) represents an observation solidity price ratio of the observation target i. For the consumption of electric quantity, the satellite needs to consume electric quantity when observing the target, and certain electric quantity is consumed in the attitude maneuver process. However, since the attitude maneuver before and after a satellite observes a certain target depends on the relative attitude difference between the target and the previous and subsequent observed targets, the calculation difficulty is high. In order to improve the efficiency of the satellite on-satellite target filtering operation, only the electric quantity consumed when the target is imaged is used as a reference index for judging whether the target is filtered or not. The amount of power consumed by the satellite in imaging the target is dur_iPc, where pc is the amount of power consumed by the satellite per imaging time. Similarly, since pc is a constant, p can also be used_i/dur_i(p_iFor the target observation gain) represents the observation power cost performance of the observation target i.

In summary, in the object filtering knowledge, only p of the observed object is utilized_i，dur_i，p_i/dur_iThe three parameters can carry out preliminary evaluation on the observation solid-storage cost ratio and the observation electric quantity cost ratio of the target i, namely, the target with lower observation resource cost ratio is deleted, so that the method can improve the quality of the targetThe decision making efficiency on the satellite is improved, and the problem of decision making short sight is solved to a certain extent. Therefore, in the ground decision knowledge learning process, only p is processed_i、dur_i、p_i/dur_iThe filtering thresholds threPro, threDur and threPDR of the three parameters are learned, so that the satellite filters out observation targets which cannot meet the thresholds of the three parameter values simultaneously during decision making (if one of the three attributes of the target is smaller than the threshold, the target is deleted), and the extraction of the target filtering knowledge can be completed.

In Step3, in order to fully utilize the computing resources of the satellite (multi-core CPU, the dominant frequency of each single core is about 80 MHz), the design idea of the on-satellite task scheduling algorithm set is to design a plurality of simple heuristic algorithms capable of running in parallel. The CPU of the satellite can perform parallel computation, but the computing capacity of each single core is limited, so the complexity of the algorithm required for each single core to run cannot be too high. Meanwhile, the scheduling strategy adopted by each heuristic algorithm has different emphasis, so that various scheduling scheme sets can be provided, and a satellite can conveniently select a proper observation scheme according to the self state during decision making.

In the satellite scheduling problem, the satellite mainly researches an algorithm in three aspects of selecting a target subset, calculating target observation time and attitude and adjusting the target subset in the heuristic algorithm design process.

Before the specific flow of the heuristic algorithm is introduced, the expression mode of the solution adopted by the invention is briefly introduced. The present invention describes an observation scheme by means of sequence (also called "target observation sequence"), and uses a scheduling solution generator to translate a sequence solution into a feasible scheduling solution. And the scheduling solution generator translates the sequence solution into a scheduling solution by adopting a construction method based on a greedy rule.

A sequence solution consisting of the target sequence is denoted by ps, and the target contained in the solution is a subset of the target corpus. The solution space of the sequence is represented by D (ps), the solution space of the scheduling solution is represented by D (ss), and the scheduling solution generator based on the greedy rule is represented by SB. For any sequence in D (ps), the scheduling solution generator SB can be used to generate its corresponding scheduling solution in D (ss). And generating a scheduling solution, arranging the targets according to the sequence of the targets in ps, and observing the corresponding targets as early as possible by adopting a method of arrangement immediately before and under the condition of meeting constraint conditions. If some targets cannot be observed due to time window or attitude maneuver constraints, the target is directly discarded and the next target is scheduled. Therefore, the scheduling solution generator SB can generate a feasible scheduling scheme that satisfies all constraints. The use efficiency of the satellite in the time dimension can be improved by the arrangement in the near future, and the satellite can reserve more time to carry out attitude maneuver so as to image the subsequent target. As shown in fig. 2, the sequence solution ps (target observation sequence) of the satellite is 3 → 7 → 9 → 1 → 6 → 8 → 4, and the scheduling solution ss after the conversion by the scheduling solution generator SB is shown in the lower half of fig. 2. The satellite cannot observe the target 1 and the target 8 due to the limitation of attitude maneuver constraints, so the two targets are discarded when the scheduling solution is generated, and the subsequent targets are directly tried to be observed.

For the sake of understanding, the target observation sequence is the scheduling solution described in the above paragraph. In the process of selecting the target subset by using a heuristic algorithm, firstly selecting a target according to a certain index, then inserting the selected target into the existing target observation sequence according to a time ascending order to obtain a new target observation sequence, and then performing scheduling solution conversion on the target observation sequence by using a scheduling solution generator SB. If the gain of the scheduling scheme can be increased, a new observation target sequence is adopted, otherwise, a newly inserted target is abandoned, and the original target observation sequence and the corresponding scheduling solution are reserved. Therefore, the target selection indexes can guide the search direction of the heuristic algorithm, namely, the diversity of the target selection indexes can enable the heuristic method set to provide various scheduling schemes.

The target selection index mainly has a time sequence index, a target observation income index, a target imaging duration (imaging duration) index and an income duration ratio index.

The time series index is to arrange the targets in ascending order according to the starting time of the time window and select the target with the earliest starting time in turn. The heuristic algorithm based on the selected index is equivalent to scheduling the target set by adopting a First Come First Served (FCFS) strategy. Namely, a sequence solution corresponding to a heuristic algorithm is the time ascending order of the targets, and then the modulation solution generator tries to arrange the observation time of each target at the earliest observable time under the condition that the timing constraint (time window constraint and attitude maneuver constraint) is met.

The target observation income indexes are that targets are arranged according to the descending order of observation income, and the targets with the highest observation income are sequentially selected. Because the optimization goal of satellite scheduling is to maximize the global gain of the observation scheme, the gain of the scheme can be rapidly increased when the local scheme is constructed by selecting the goals in a gain descending manner, but the gain of the global scheme cannot be guaranteed to be higher.

The target imaging duration (imaging duration) index is to arrange the targets in ascending order of the imaging durations (imaging durations) and sequentially select the target with the minimum imaging duration (imaging duration). From the foregoing analysis, it can be seen that the amount of resources consumed by the target, such as the amount of power consumed by the target, is proportional to the imaging duration of the target, so that the selected target indicator enables the satellite to observe more targets when the satellite resource constraint is less stringent.

The profit-to-time ratio index refers to the objective in terms of the observed profit-to-time ratio (p)_i/dur_i，p_iFor the observed gain of the task, dur_iImaging duration for the task) in descending order, the target with the highest observed revenue-to-duration ratio is selected in turn. From the foregoing analysis, it can be known that the solid and electric resources consumed by the target are proportional to the imaging duration thereof, so that the index can be used for evaluating the cost performance of resource consumption for observing a certain target.

The specific method for calculating an available observation scheme by using a scheduling solution generator is as follows, the profit-to-duration ratio of the targets is set to be arranged in a descending order of the profit-to-duration ratio of the targets, namely, a first target, a second target, a third target, a fourth target, a fifth target, a sixth target and a seventh target, for example, as shown in fig. 3 and table 4, the task ordering condition of the current target under different indexes is shown. The heuristic algorithm first selects a certain index and then calculates an available scheduling scheme through the scheduling solution generator SB according to the construction method described above. Taking the profit-to-time ratio index as an example, the profit-to-time ratio of the targets is arranged in a descending order of 4-6-3-1-8-7-9. First, according to the attitude of the current satellite and the resource usage state, whether the target 4 can be observed is considered, and if so, the current best sequence (hereinafter referred to as "current best solution", cblan) is updated to {4 }. Secondly, on the basis of the current best solution, adding the target 6 to the current best solution, arranging the targets according to the ascending order of time, namely considering the scheduling solution converted by the sequence 6 → 4, and if the benefit of the scheduling solution is better than that of the original current best solution, updating the current best solution to be {6, 4 }. And then, analogizing in sequence, after adding a target on the basis of the current best solution each time, arranging the tasks according to an ascending order to obtain a new sequence solution, converting the new sequence solution into a scheduling solution by using a scheduling solution generator, and replacing the original current best solution if the income of the new scheduling solution is higher, or keeping the original current best solution. The pseudo code of the heuristic algorithm is shown as algorithm 5. Where numT represents the number of targets within the look-ahead time window; psC, the sequence solution obtained after adding the target in the current optimal scheme each time; ssC is the scheduling solution corresponding to the current sequence solution psC; proC is the benefit of the current scheduling scheme.

TABLE 4 target parameters

To further enrich the diversity of the set of heuristics, it may be considered that the algorithm designs another set of heuristics that operate only on the top 1/2 targets in the sorted list, and directly abandon the rear targets in the list (if sorted into target 1, target 2, target 3, target 4, only target 1 and target 2 are considered). Such an algorithm can only make scheduling attempts on the most valuable targets when satellite resources are in short supply, thereby saving satellite resources as a whole. Although the scheduling scheme generated by the heuristic algorithm may not perform well on the observation income, the scheduling scheme has more excellent performance in the aspects of indexes such as observation time, scheme income-solid ratio (scheme income/scheme consumption solid quantity) and the like. The heuristic algorithm set can provide a more diversified observation scheme set, and the satellite can select the most appropriate observation scheme more conveniently in the scheme evaluation stage.

In Step4, the observation scheme evaluation module is to make the satellite perform comprehensive evaluation on each scheme according to the gains, solid consumption, power consumption and other indexes of each local observation scheme generated by the heuristic algorithm set when the satellite decides the next imaging target each time, so as to select a local scheduling scheme that can improve the global observation gains (the sum of the gains of the targets observed by the satellite in one orbit) more probably.

In the observation scheme evaluation module, the satellite evaluates five attributes of the observation scheme, which are respectively: scheme profit proP, solid reserve consumption sdP, profit-solid reserve ratio psdR, total electricity consumption egP, profit-electricity ratio pegR, execution duration edP, and profit-electricity ratio pedR.

The plan profit proP refers to the sum of target observation profits of all imaging targets in the local observation plan. The index represents the overall gain of the local scheme, and when the resources such as the fixed power of the satellite are sufficient, the global observation gain of the satellite is more likely to be increased by the higher local observation scheme gain.

The consolidation consumption sdP refers to the amount of satellite consolidation resources consumed by the local observation scheme. Since the solid consumption of each target during imaging is proportional to the imaging duration of the target, the solid consumption index of the local observation scheme can be calculated by summing the imaging durations of all targets in the local observation scheme and then multiplying the sum by the image acquisition code rate cr during satellite imaging. The index is used for describing the consumption condition of the local observation scheme on the satellite solid resources, and when the satellite solid resources are insufficient, the local observation scheme with less solid consumption is preferentially selected.

The benefit-to-retention ratio psdR is the benefit of the local observation scheme divided by the retention consumption of the scheme. The index is used for describing the use efficiency of the satellite solid storage resources in the local observation scheme, and if the solid storage constraint of the satellite is tight constraint, the global observation income of the satellite can be increased with higher probability than the income solid storage of the local observation scheme by emphasizing the increase of the income solid storage of the local observation scheme during each decision.

The total power consumption egP is the power consumption of the local observation scheme, which mainly includes the imaging power consumption and the attitude maneuver power consumption. The index is used for describing the consumption condition of the local observation scheme on the satellite electric quantity resource, and the local observation scheme with less electric quantity consumption is preferentially selected when the satellite electric quantity resource is insufficient.

The profit-to-electric-quantity ratio pegR is the scheme profit of the local observation scheme divided by the total electric quantity consumption of the scheme. The index is used for describing the use efficiency of satellite electric quantity resources in the local observation scheme, and if the electric quantity constraint of the satellite is tight constraint, the global observation yield of the satellite can be increased with a higher probability by emphasizing the improvement of the yield electric quantity ratio of the local observation scheme in each decision.

The execution duration edP refers to the total flight duration of the satellite consumed by the local observation scheme, and mainly includes the imaging duration and the attitude maneuver duration. The index is used for describing the consumption situation of the local observation scheme on the satellite execution time, and when the satellite imaging orbit is close to the end, the local observation scheme with less execution time consumption is preferentially selected.

The profit-to-time ratio, pedR, refers to the solution profit of the local observation solution divided by the execution time of the solution. The index is used for describing the time use efficiency of the satellite in the local observation scheme, and if a large amount of resources are left when the imaging orbit of the satellite is close to the end, the global observation yield of the satellite can be increased with a higher probability than the energy consumption rate when the yield of the local observation scheme is increased in each decision.

In summary, the evaluation method of the local observation scheme is shown in the following formula, wherein w₁To w₇The weight coefficients corresponding to the scheme profit proP, the solid-state consumption sdP, the profit-solid-state ratio psdR, the total electricity consumption egP, the profit-electricity ratio pegR, the execution duration edP and the profit-electricity ratio pedR are also the knowledge information to be learned by the ground learning module. Each weight has a value range of-100, 100]The weighting factor may take a negative value because a smaller index (e.g., the solid inventory consumption sdP) indicates a higher solution composite score.

cScore＝w₁·proP+w₂·sdP+w₃·psdR+w₄·egP+w₅·pegR+w₆·edP+w₇·pedR

In order to further improve the decision scientificity of the satellite, the consumption conditions of various resources of the satellite at different stages in an imaging orbit can be obtained by carrying out statistical analysis on the use conditions of the satellite in historical scenes (each historical scene is an imaging circle of the satellite). The statistical data can be used for the satellite to decide which attributes of the local observation scheme should be optimized under the current situation through comparison with historical state data. For example, if the current satellite has executed a 10min observation scenario in a 40min imaging orbit, 240 observation gains have been obtained, with 33% of the solid resources consumed and 27% of the power resources consumed. However, according to the statistical analysis of the historical scenario, in the use case of the satellite with higher global observation yield, the satellite obtains about 220% of observation yield and consumes 27% of the fixed storage and 28% of the electric quantity when the satellite completes 25% (10/40) observation circles. Therefore, the current satellite can pay more attention to the profit-to-solid ratio (observation profit/solid) of the observation target in the subsequent decision making of the imaging orbit, namely, the corresponding weight of the evaluation index of the scheme profit-to-solid ratio (scheme profit/scheme solid consumption) can be properly increased when the local observation scheme is evaluated, so that the local observation scheme with the higher scheme profit-to-solid ratio can be selected in each decision making.

To describe the amount of resources such as the amount of power and the amount of power that the satellite should consume when completing the percentage of different observation orbit turns and the gain obtained by observing the target, a concept of the progress ratio is introduced, where the progress ratio refers to the percentage of the satellite completing the current observation orbit turn, for example, the total time of the current imaging monorail is 40min, and the progress ratio of the satellite is 25% ((10/40) × 100%) when the satellite has performed the observation scheme of 10 min.

The invention designs a statistical method for acquiring the solid storage and electric quantity resources correspondingly consumed by the satellite at different progress ratios and the observation income expected to be acquired. The method assumes that the consumption situation of observation resources such as satellite solid storage, electric quantity and the like and the obtained observation benefits are consistent with the distribution situation of targets. For a certain progress ratio pr in a certain historical scene, the target set of the scene is divided into ts1 and ts2 by the sub-satellite point corresponding to the moment, ts1 is a target set of which the midpoint of the visual time window of the target is smaller than or equal to the moment corresponding to the progress ratio pr, and ts2 is a target set of which the midpoint of the visual time window of the target is larger than the moment corresponding to the progress ratio pr (target set which does not pass the top yet). Within the imaging circle, the observation gain that the progress ratio pr should obtain is the sum of the observation gains of the targets in ts1, i.e., exPro — ProS (ts1), where exPro represents the expected observation gain of the progress ratio, and ProS (…) represents the sum of the observation gains of all targets in a certain target set. Since the solid and electric power consumption of a certain target is observed to be proportional to the imaging time length of the target, the percentage of solid and electric power resources consumed by the progress ratio pr in the imaging circle is exRCR ═ DurS (ts1)/(DurS (ts1) + DurS (ts1)), where exRCR is the percentage of resources expected to be consumed at the progress ratio, and DurS (…) represents the sum of the imaging time lengths of all targets in a certain target set.

As shown in fig. 4, the current progress ratio of the satellite corresponds to an off-satellite point a, the point a divides the target corpus into two subsets ts1 and ts2, where ts1 is {3, 7}, ts2 is {9, 1, 6, 8, 4}, so that the progress ratio in this usage scenario is expected to obtain an observation yield of exPro — ProS (ts1) ═ 95, and the percentage of consumed solid and electric resources is equal to

(parameters of the target are shown in Table 1).

Table 1 target parameter examples

After the observation gain and the percentage of solid storage and electric quantity resource consumption corresponding to a certain imaging circle of the progress ratio pr are obtained, the data of the progress ratio pr in all the historical imaging circles are subjected to statistical analysis (assuming that the indexes of the progress ratio obey positive distribution), and the corresponding mean value and standard deviation are solved, so that the data can be used as the basis for judging the satellite state at the decision moment by the satellite. Taking the expected observation yield of the progress ratio pr as an example:

wherein numSce represents the number of scenes;

indicating the expected observation gain of progress ratio pr in the ith scene;

mePro_prrepresents the observed revenue expected by the progress ratio pr on average;

sdPro_prindicating the standard deviation of the observed revenue expected for progress vs. pr under the assumption of positive distribution.

Average expected observed revenue mePro corresponding to pr at known schedule_prAnd standard deviation of observed yield sdPro_prThen, the relative relationship between the current observation yield and the expected observation yield of the satellite is expressed by rePro in the following expression (29). Wherein, curPro_prIndicating that the current progress of the satellite is more than the observed revenue that has been obtained under pr. A rePro of zero indicates that the current observed yield is consistent with the expectation, and a value greater than zero indicates that the current observed yield exceeds the expectation. Less than zero indicates that the current observed revenue is not as good as the expected observed revenue. The observation gain weight adjustment coefficients acPro and rePro are in an exponential relationship, as shown in equations (29) and (30), where aptro and bPro are correlation parameters and are parameters to be learned by the ground learning module, and the range of aptro is a negative real number

The range of bPro is the total real number

According to the formula, when the rePro is increased, the acPro is reduced, and the gain of the local observation scheme is not emphasized too much when the satellite decision is made; however, when the rePro is smaller, the acPro becomes larger, which indicates that the weight corresponding to the observation gain in the local observation scheme at the time of scheme evaluation should be appropriately increased when the satellite decision is made.

acPro＝e^{(aPro·rePro+bPro)} (30)

w₁·acPro·proP (31)

The mean value meRCR of the percentage of solid storage and electric quantity consumption corresponding to the schedule ratio pr can be obtained in the same way_prAnd standard deviation sdRCR_prAs shown in equations (31) and (32).

Wherein the content of the first and second substances,

representing expected solid storage and electric quantity resource consumption corresponding to the progress ratio pr in the ith scene;

meRCR_prthe average expected fixed storage and electricity resource consumption of the progress rate pr is represented;

sdRCR_prindicating the standard deviation of schedule versus pr expected reserve, power resource consumption under the assumption of being too distributed.

The relative relationship redr between the current and expected consumption of the satellite's resources is shown in equation (33). Wherein, curSDR_prRepresenting the current percent of inventory consumed.

The relative relation coefficients of the electric quantity resources, namely the REEGR and the curEGR can be obtained in the same way_prIndicating current power consumptionPercent consumption.

The adjusted evaluation function of the protocol was:

cScore＝w₁·acPro·proP+w₂·acSD·sdP+w₃·acPsdR·psdR+w₄·acEG·egP

+w₅·acPegR·pegR+w₆·acED·edP+w₇·acPedR·pedR (35)

wherein the content of the first and second substances,

acPro＝e^{(aPro·rePro+bPro)} (36)

acSD＝e^{(aSD·reSDR+bSD)} (37)

acPsdR＝e^{(aPsdR·reSDR+bPsdR)} (38)

acEG＝e^{(aEG·reEGR+bEG)} (39)

acPegR＝e^{(aPegR·reEGR+bPegR)} (40)

acED＝e^{(aED·reRCR+bED)} (41)

acPedR＝e^{(aPedR·reRCR+bPedR)} (42)

in the evaluation formula, except for the observation scheme gain index, other indexes are all related to a certain resource of the satellite (fixed storage, electric quantity and execution time), so that other adjustment coefficients all represent the relative relation between the current satellite state and the historical statistical state by using the rerCR. However, it should be noted that although the later adjustment coefficients use reRCR to represent relative relationships, the correlation parameters of the reRCR in each adjustment coefficient are different, that is, different correlation parameters are trained in the ground learning module, which allows the adjustment coefficients to maintain mutually independent characteristics to some extent.

As can be seen from the table, in the observation protocol evaluation module, there are 17 parameters (last 17 rows in Table 2) that need to be trained by the ground learning module, which are aPro, bPro, aSD, bSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, aED, bED, aPedR, bPedR, w₁，w₂，w₃，w₄，w₅，w₆，w₇。

TABLE 2 this section of relevant parameter information

The ground parameter learning method in Table 6 is given in detail below

The ground knowledge learning module improves the satellite on-line decision scheduling capability by learning related parameters in the decision-making auxiliary knowledge according to the historical scene information of the later satellite, so that the satellite has higher probability to obtain higher global benefits in a new scene.

According to the above analysis, the parameters to be trained by the ground learning module are 3 threshold parameters (threPro, threDur, threPDR) in the target filter module, and 17 parameters (aPro, bPro, aSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, w₁，w₂，w₃，W₄，W₅，w₆，w₇). Here, the set of filtering threshold parameters tarFilter represents the 3 threshold parameters used in the target filtering module, and the set of scheme evaluation parameters schEval represents the 17 evaluation parameters used in the scheme evaluation module.

By learning the parameters, the filtering precision of the targets during autonomous mission planning on the satellite can be improved, so that the satellite applies computing resources to the more valuable targets during decision making, and the decision making efficiency and quality are improved; meanwhile, the global viewing capability of the satellite in the process of evaluating a plurality of local observation schemes can be improved, and the evaluation parameters can be adaptively adjusted according to the state of the satellite, so that the problem of short decision-making vision is solved, and the global observation yield of the satellite is improved.

Under the condition that the target global distribution has a certain rule, the ground learning module adjusts the parameters to enable the satellite to obtain a higher global benefit when the satellite uses the parameters for online scheduling, namely, the satellite uses the parameters to obtain a better global benefit when the satellite performs online scheduling in a historical scene.

As can be seen from the foregoing description, each time the target recognition star images the current target, look-ahead information (including target position information, imaging duration, and observation gain) of the target is extracted first; deleting a target with lower observation value in the look-ahead information according to three threshold parameters in the target filtering knowledge tarFilter; then, calling a plurality of heuristic algorithms in the scheduling algorithm set to generate a plurality of local observation schemes; then, carrying out comprehensive evaluation on each local observation scheme according to 21 evaluation parameters of schEval in scheme evaluation knowledge and a relevant evaluation method; and finally, selecting the local observation scheme with the highest score and locking the first target in the scheme as the next observation target. Therefore, the global observation gain of the target recognition star in a scene can be regarded as a function of the tarFilter threshold parameter set and the schEval scheme evaluation parameter set. Thus, the process of parameter learning can be considered as a process of parameter optimization, with the goal of maximizing the sum of the global benefits of the target-identified stars in all scenarios, i.e.

Wherein numSce is the number of historical scenes;

talFilter is a set of filtering threshold parameters (comprising 3 threshold parameters);

schEva is a scheme evaluation parameter set (comprising 17 evaluation parameters);

globalPro_i(… ) is the global observed yield in the ith scene under the usage trait parameter.

The value range of the required parameters is shown in table 3 for the ground learning parameters.

TABLE 3 ground learning parameters

According to the analysis and description in the previous section, the learning problem of the parameters is converted into an optimization problem, and the objective function values of different parameters need to be calculated through a scene simulator. Because of the many parameters that need to be optimized, and the problem may have a plurality of local peaks (local optimal solutions), and the characteristics of the problem result in that each calculation of the objective function value for a parameter will consume a large calculation time. Therefore, the invention selects a distribution estimation algorithm combined with the niche strategy to optimize the relevant parameters.

The distribution estimation algorithm (EDA) is a new type of evolutionary computing algorithm. Different from the traditional evolution algorithm, the method does not have operations such as crossing or mutation and the like, and optimizes the objective function through a population evolution strategy based on probability distribution. Because the EDA algorithm can evaluate the evolution information of the population from a macroscopic perspective, the EDA algorithm generally has better global property and diversity and is not easy to fall into a local optimal solution for a long time to cause the phenomenon of premature convergence. The EDA algorithm firstly estimates the distribution condition of the dominant individuals in the population, then establishes a probability model of the dominant individuals and then obtains the offspring individuals in a sampling mode.

The Niche (Niche) is a concept from biology, and refers to a living environment in a specific environment, and in the process of evolution, organisms generally live together with the same species and multiply offspring together; they also all live in a particular geographic area. The basic idea of niche strategy (Niching) is to apply niche concepts in biology to evolutionary computation, divide each generation of individuals in the evolutionary computation into a plurality of classes, and select a plurality of individuals with high fitness in each class as excellent representatives of one class to form a group. According to the invention, a niche strategy based on adjacent clustering is adopted to search each individual in own neighborhood so as to achieve the purpose of increasing population diversity.

The distribution estimation algorithm is used for solving specific values of the parameters according to historical scenes, and as shown in fig. 5, an algorithm framework is composed of the following 7 parts:

step71, initializing a population, generating an initial population uniformly distributed in a value domain by utilizing random sampling, and evaluating an adaptive value of each individual;

and step72, niche division, namely dividing the population into a plurality of sub-populations (niches) by adopting a K-means clustering algorithm based on Euclidean distance. The number of the sub-total groups is a function of the iteration times, and the larger the iteration times, the larger the number of the sub-groups;

step74, child sampling, wherein the algorithm selects a certain character population at a certain probability each time for sampling operation, and the selected neutron population samples child according to the probability distribution model of the selected neutron population, and the step is finished until the number of newly sampled individuals of the algorithm is equal to the size of the current population;

step75, individual selection, each sub-population combines parent individuals and child individuals, and a near elite optimization strategy is adopted to obtain a new generation population;

step76, local search, wherein dominant individuals in the population are optimized with a certain probability by adopting local search algorithms such as a hill climbing method and the like, so that the quality of the solution is further improved;

and Step77, judging whether the termination condition of the algorithm is reached or not, returning to the found optimal individual if the termination condition of the algorithm is reached, and otherwise, jumping to Step72 and repeating steps 72-76.

In the algorithm framework, Step72 introduces a niche strategy based on adjacent clustering, and the population is divided into a plurality of sub-populations by using a Euclidean distance-based K-means algorithm. The number of the sub-populations is adaptively adjusted according to the optimized algebra, and when the optimized algebra is less, the number of the sub-populations is less, so that the exploration capability (exploration) of the algorithm is better enhanced; when the optimized algebra is more, the number of the sub-populations is increased, and the optimization development capability (optimization) of the algorithm is enhanced at the later stage of the algorithm. The sub population number mapping function adopted by the algorithm is shown in a formula (44), wherein numIter represents an optimized algebra; numPop represents the number of sub-populations; round () represents a round integer operation.

In Step73, the algorithm first selects the top 10% of dominant individuals in the whole population, and then establishes a Gaussian distribution probability model in each sub-population according to the dominant individuals. Taking the sub-population i as an example, the mean and variance of the fitted positive distribution model are shown in equations (45), (46). Wherein, numExc_iThe number of dominant individuals of the ith sub-population;

representing the j dominant individual in the ith sub-population; mu.s_iRepresenting the mean value of the dominant individuals in the ith sub-population; sigma_iThe standard deviation of the dominant individual in the ith sub-population is shown.

At Step74, the algorithm determines a sub-population to be sampled by roulette for each individual in the new generation, and generates a new individual by sampling according to the corresponding probability distribution model of the dominant individual. The probability of each sub-population being selected is shown as a formula (47), namely, the sum of the adaptive values of the dominant individuals in the sub-population accounts for the percentage of the sum of the adaptive values of the dominant individuals in the whole population. Wherein, selectP_iRepresents the probability that the ith sub-population is selected, and Fit () represents the fitness value of an individual. Because each sub-population uses the mutually independent sampling model, the new generation of individuals can be distributed in different areas of the solution space, and the diversity of the individuals in the whole population is protected.

In Step75, the algorithm adopts a near elite optimization strategy to merge parents and children individuals, so as to obtain a new generation of population. Specifically, for each parent individual, the child individual closest to the parent individual (euclidean distance) is found, and the fitness values of the two are compared. If the adaptation value of the child individual is higher than that of the parent individual, replacing the corresponding parent individual with the child individual, and deleting the individual from the child individual list; otherwise, the number of times of competition failure of the descendant is added with 1 (the initial value is 0), and the descendant is deleted when the number of times of competition failure of the descendant is equal to 3.

In Step76, the dominant individuals in each population are further optimized using the hill climbing method. The local search probability for the dominant individual is shown in equation (48), where searchP_jRepresenting an individual x_jProbability of performing a local search operation, Fit_MThe fitness value of the optimal individual of the whole population is represented. When optimizing a certain attribute of an individual, the hill climbing method fixes all other attributes and optimizes a single attribute in a variable step length mode. And if the hill climbing method optimizes all the attributes one by one, finishing the algorithm if the individual adaptive value is not improved, otherwise, starting optimization from the first attribute. The strategy of changing the step length is to try the maximum step length first when optimizing the attribute each time, if the current step length isIf the step length can optimize the attribute, replacing the original attribute value with the new attribute value, and keeping the step length; otherwise, changing the step size to be half of the current step size and trying to optimize; and if the current step size is smaller than the minimum step size, ending the optimization of the attribute. The maximum step size is set to 10 and the minimum step size is 0.1.

In Step77, the algorithm determines whether the termination condition is reached according to the fact that the elements in the dominant individual set of the population have not been changed for 2 consecutive generations, that is, no new dominant individual is generated for 2 consecutive generations, and the algorithm is terminated and the found optimal individual is returned.

The following is an experimental analysis using the method provided by the present invention.

Different from a traditional artificial point-taking mode of experimental design, the method directly utilizes an algorithm to generate a visual time window and an observation side-sway angle between the satellite and the target in the experimental design, and the design can ensure that the target has stable probability distribution, can better verify the effectiveness of a solving idea and has higher universality value. The orbital height of the high-resolution imaging satellite is about 500km mostly, and generally, one satellite orbit of the satellite with the orbital height is about 90 min. Due to the limitation of imaging conditions such as illumination and cloud and fog, the time available for imaging in one orbit of the satellite is about 31min (1860 s). Therefore, in the experimental design, the time of each scene is about 1900s, i.e. the visible time window of all tasks is between 0s and 1900 s.

In a simulation verification scene, the orbital height of a satellite is 500km, the initial solid memory is 600Gb, and the imaging solid memory write code rate is 3 Gb/s. In the aspect of mobility, the maximum speed of the satellite attitude maneuver is 1 degree/s, and the acceleration of the satellite attitude maneuver is 0.5 degree/s²The acceleration at deceleration is 0.25 DEG/s². The stable time is 5s, and the range of the yaw angle is [ -30 DEG, 30 DEG ]]The maximum compound inclination angle is 40 degrees. The number of conventional observation targets of each satellite follows a Gaussian distribution with a mean value of 48 and a standard deviation of 2, and each satellite is to be observedThe distribution of the measured objects is composed of a uniform area distribution and two key area distributions, and the specific gravity of the three is 4: 2: 4. the look-ahead time length of the satellite is 90s, and the target information can be acquired 90s ahead of the over-the-top time. In the aspect of electric quantity, the initial electric quantity of the satellite is 5kwh, the power consumption rate of unit time during imaging is 3kw, the power of the attitude maneuver during acceleration motion is 15kw, the power of the attitude maneuver during deceleration motion is 10kw, and the power of the attitude maneuver during uniform motion is 3 kw.

The observation target consists of a uniform area, an important area 1 and an important area 2, and accounts for 40%, 20% and 40% of the conventional targets of each satellite respectively. The combination mode represents a more complex target distribution situation, and the performance of the algorithm can be better tested.

The relevant parameters for the uniform distribution are shown in table 4, where the objects consist of point objects and band objects, where the band objects account for 50% of the objects in the area. The overhead time of the target in the area is subjected to uniform distribution from 30s to 1860s after the scene starts; observing that the side swing angle follows the uniform distribution of minus 30 degrees to 30 degrees; the imaging time of the point target is still 5s, the imaging yield obeys Gaussian distribution with the mean value of 40 and the standard deviation of 10; the imaging duration of the strip target follows Gaussian distribution with the mean value of 15s and the standard deviation of 3 s; the imaging yield follows a gaussian distribution with a mean value of 60 and a standard deviation of 10.

TABLE 4 Uniform distribution of target related parameters

The relevant parameters of the distribution of the key area 1 are shown in table 5, the target is mainly a point target, the overhead time of the satellite is subjected to uniform distribution from 30s to 385s after the scene starts, the imaging time of the target is 5s, the imaging time and the observation side-sway angle of the satellite are subjected to uniform distribution from-26 degrees to 26 degrees, the imaging yield is subjected to Gaussian distribution with the mean value of 30 and the standard deviation of 10.

TABLE 5 Objective distribution-related parameters for region of interest 1

The relevant parameters of the distribution of the key area 2 are shown in table 6, the target is also a point target, the satellite over-top time obeys gaussian distribution with the mean value of 900s and the standard deviation of 300 s; the observation side sway angle of the satellite follows Gaussian distribution with the mean value of-20 degrees and the standard deviation of 5 degrees; the imaging time of the target is 5s, the target observation yield obeys Gaussian distribution with the mean value of 40 and the standard deviation of 5. The targets in the region are relatively uniformly distributed and more concentrated, and certain challenges are provided for processing scheduling algorithm timing constraints.

TABLE 6 distribution-related parameters of objects in region of emphasis 2

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A distribution estimation method is characterized in that a double-star cluster comprises a target discovery star and a target identification star, and the distribution estimation method comprises the following steps:

step2, according to a filtering threshold threPDR of target imaging time length of an observation target, a filtering threshold threPro of target observation income and a filtering threshold threPDR of target observation income time length ratio in target filtering knowledge, preliminarily evaluating the observation solid-state cost ratio and the observation electric quantity cost ratio of the target in a forward-looking time window to delete the target with lower observation value in forward-looking information, wherein the target information comprises target position information, imaging time length and observation income, and the use constraint of a satellite is time-constrained, attitude maneuver constraint, solid-state constraint and electric quantity constraint;

step3, calling a plurality of heuristic algorithms in the scheduling algorithm set to generate a plurality of local observation schemes, wherein the local observation schemes refer to observation schemes generated by the satellite in the current state only according to target information in a look-ahead time window, and the observation schemes are described by sequence solutions, and a scheduling solution generator is utilized to translate the sequence solutions into a feasible scheduling solution;

step4, carrying out comprehensive evaluation on each local observation scheme by using an evaluation function of the local observation scheme according to evaluation parameters in scheme evaluation knowledge and an evaluation method;

step72, dividing the niche, dividing the population into a plurality of sub-populations by adopting a K-means clustering algorithm based on Euclidean distance, wherein the number of the sub-total populations is a function of the iteration times, and the more the iteration times, the more the number of the sub-populations is;

step74, sampling offspring, selecting a certain sub-population for sampling operation by the algorithm with a certain probability each time, sampling offspring by the selected sub-population according to the probability distribution model of the selected sub-population, and ending the Step until the number of newly sampled individuals of the algorithm is equal to the size of the current population;

step76, local search, optimizing dominant individuals in the population with a certain probability by adopting a local search algorithm, and further improving the quality of the solution;

2. The distribution estimation method of claim 1, wherein the Step3 of invoking the plurality of heuristics in the scheduling algorithm set to generate the plurality of local observation schemes includes using the target selection index to guide a search direction of the heuristics as follows:

selecting targets in the selection process of the target subset by using a heuristic algorithm, inserting the selected targets into the existing target observation sequence according to a time ascending order to obtain a new target observation sequence, and then performing scheduling solution conversion on the target observation sequence by using a scheduling solution generator; if the gain of the scheduling scheme can be increased, adopting a new target observation sequence, otherwise giving up a newly inserted target, and reserving the original target observation sequence and a corresponding scheduling solution;

the target selection index comprises a time sequence index, a target observation income index, a target imaging duration index and an income duration ratio index, wherein: the time sequence index is that the targets are arranged according to the ascending sequence of the starting time of a time window, and the target with the earliest starting time is selected in sequence; the target observation income indexes are that targets are arranged according to the descending order of observation income, and the target with the highest observation income is selected in sequence; the target imaging duration index is that targets are arranged according to the ascending order of imaging duration, and the target with the minimum imaging duration is selected in sequence; the income time length ratio index is that the targets are arranged according to the descending order of the observed income time length ratio, and the target with the highest observed income time length ratio is selected in sequence.

3. The distribution estimation method of claim 2, wherein the "target selection index" selected first by the heuristic algorithm is a profit-to-time ratio index, and the specific method of calculating one available observation scheme by the scheduling solution generator is as follows, and the profit-to-time ratio of the targets is set to be arranged in descending order of a first target, a second target, a third target, a fourth target, a fifth target, a sixth target, and a seventh target:

firstly, according to the attitude and the resource use state of the current satellite, firstly considering whether a first target of a target can be observed, and if so, updating the current best sequence into a { first target }; secondly, on the basis of the current best solution, adding a second target to the current best solution, and arranging the targets according to a time ascending sequence, namely considering a scheduling solution converted from a sequence of the second target → the first target, and if the benefit of the scheduling solution is better than that of the original current best solution, updating the current best solution into { the second target, the first target }; and then, analogizing in sequence, after adding a target on the basis of the current best solution each time, arranging the tasks according to an ascending order to obtain a new sequence solution, converting the new sequence solution into a scheduling solution by using a scheduling solution generator, and replacing the original current best solution if the income of the new scheduling solution is higher, or keeping the original current best solution.

4. The distribution estimation method according to any one of claims 1 to 3, wherein Step4 specifically includes:

comprehensively evaluating each scheme according to the income, solid storage consumption and electric quantity consumption indexes of each local observation scheme generated by a heuristic algorithm set so as to select a local scheduling scheme capable of improving the global observation income more probably;

in the observation scheme evaluation module, the satellite evaluates seven attributes of the observation scheme, which are respectively: scheme income, solid deposit consumption, income solid deposit ratio, total electric quantity consumption, income electric quantity ratio, execution duration and income time consumption ratio; wherein: the scheme profit is the sum of target observation profits of all imaging targets in the local observation scheme; the consolidation consumption is the amount of satellite consolidation resources consumed by the local observation scheme; the profit-to-solid ratio is the profit of the local observation scheme divided by the solid consumption of the scheme; the total power consumption is the power consumption of the local observation scheme and comprises imaging power consumption and attitude maneuver power consumption; the income electric quantity ratio is the scheme income of the local observation scheme divided by the total electric quantity consumption of the scheme; the execution duration is the total flight duration of the satellite consumed by the local observation scheme, and mainly comprises the imaging duration and the attitude maneuver duration; the profit-to-time consumption ratio is the plan profit of the local observation plan divided by the execution time of the plan.

5. The distribution estimation method according to claim 4, wherein the evaluation function of the local observation scheme is:

cScore＝w₁·acPro·proP+w₂·acSD·sdP+w₃·acPsdR·psdR+w₄·acEG·egP+w₅·acPegR·pegR+w₆·acED·edP+w₇·acPedR·pedR (35)

wherein the content of the first and second substances,

acPro＝e^{(aPro·rePro+bPro)} (36)

acSD＝e^{(aSD·reSDR+bSD)} (37)

acPsdR＝e^{(aPsdR·reSDR+bPsdR)} (38)

acEG＝e^{(aEG·reEGR+bEG)} (39)

acPegR＝e^{(aPegR·reEGR+bPegR)} (40)

acED＝e^{(aED·reRCR+bED)} (41)

acPedR＝e^{(aPedR·reRCR+bPedR)} (42)

in the formula, w₁Weight coefficient, w, for scheme benefit, proP₂Weight coefficient corresponding to depletion sdP, w₃A weight coefficient, W, corresponding to the profit-survival ratio psdR₄A weight coefficient, w, corresponding to the total power consumption egP₅A weight coefficient, w, corresponding to the profit-to-electric-quantity ratio, pegR₆For execution duration edP to correspondWeight coefficient of (d), w₇The value range of each weight coefficient is [ -100, 100 ] corresponding to the gain-time-consumption ratio pedR]Pr is a progress ratio, rePro is a relative relationship between a current observation gain and an expected observation gain, acPro is an observation gain weight adjustment coefficient, acSD is a solid consumption weight adjustment coefficient, acpdr is a gain solid ratio weight adjustment coefficient, acEG is a total power consumption weight adjustment coefficient, acpergr is a gain power ratio weight adjustment coefficient, acED is an execution time length weight adjustment coefficient, acperr is a gain time consumption weight adjustment coefficient, apc is a scheme gain correlation coefficient, bPro is a scheme gain correlation coefficient, adpro is a scheme gain correlation coefficient, aSD is a solid consumption correlation coefficient, bSD is a solid consumption correlation coefficient, apdr is a gain solid storage ratio correlation coefficient, bpdr is a gain solid ratio correlation coefficient, aEG is a total power consumption correlation coefficient, bEG is a total power consumption correlation coefficient, aPegR is a gain power ratio correlation coefficient, pepr is a power ratio correlation coefficient, aED is an execution time length correlation coefficient, bED is the correlation coefficient of execution duration, aPedR is the correlation coefficient of profit-to-time consumption ratio, bPedR is the correlation coefficient of profit-to-time consumption ratio, redR is the relative relation between the current satellite fixed resource consumption and the expected fixed resource consumption, REEGR is the relative relation coefficient of electric quantity resource, ReRCR represents the relative relation between the current satellite state and the historical statistical state, curPro_prIndicating the current progress of the satellite compared to the observed gain, mePro, that has been obtained under pr_prIs the average expected observed benefit, sdPro_prIndicating the standard deviation of the observed revenue expected for progress vs. pr under the assumption of positive distribution.

6. The distribution estimation method according to claim 5, wherein the method of "selecting the local observation scenario with the highest score" in Step5 specifically includes:

the global observation income of the target identification star in one scene is regarded as a function of a tarFilter threshold parameter set and a schEval scheme evaluation parameter set, a parameter learning process is regarded as a parameter optimization process, an optimization target is the sum of the global income of the target identification star in all learning scenes, and the calculation formula is that the corresponding parameters when the global observation scheme is calculated to be the highest are as follows:

wherein numSce is the number of historical scenes;

the tarFilter is a filtering threshold parameter set, and comprises a filtering threshold threPDR, a filtering threshold threPro and a filtering threshold threPDR in Step 2;

SchEva includes aPro, bPro, aSD, bSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, w₁，w₂，w₃，w₄，w₅，w₆，w₇；

7. The distribution estimation method according to claim 6,

in Step72, the adopted sub population number mapping function is shown as a formula (44), wherein numIter represents the optimized algebra; numPop represents the number of sub-populations; round () represents a round to integer operation;

in Step73, the dominant individuals of the top 10% of the whole population are selected, then a Gaussian distribution probability model is established in each sub-population according to the dominant individuals, taking sub-population i as an example, the mean value and the variance of the fitting positive-power distribution model are shown as formulas (45) and (46), wherein numExc_iThe number of dominant individuals of the ith sub-population;

representing the j dominant individual in the ith sub-population; mu.s_iRepresenting the mean value of the dominant individuals in the ith sub-population; sigma_iRepresenting the standard deviation of the dominant individual in the ith sub-population;

in Step74, for each individual in the new generation, determining a sub-population for sampling by a roulette method, and generating a new individual by sampling operation according to a corresponding dominant individual probability distribution model; the probability of each sub-population being selected is shown as a formula (47), namely the sum of the adaptive values of the dominant individuals in the sub-population accounts for the percentage of the sum of the adaptive values of the dominant individuals in the whole population; wherein, selectP_iRepresenting the probability that the ith sub-population is selected, and Fit () representing the adaptive value of an individual;

in Step76, the dominant individual in each population is further optimized by using the hill climbing method, and the local search probability of the dominant individual is shown as the formula (48), wherein searchP_jRepresenting an individual x_jProbability of performing a local search operation, Fit_MRepresenting the adaptive value of the optimal individual of the whole population, fixing all other attributes when optimizing a certain attribute of the individual by a hill climbing method, optimizing a single attribute by adopting a variable step length mode, ending the algorithm if the adaptive value of the individual is not improved after optimizing all the attributes one by the hill climbing method, otherwise, starting the optimization from the first attribute, wherein the strategy of the variable step length is to try the maximum step length at first during the optimization of the attribute each time, and replacing the original attribute value with a new attribute value if the current step length can optimize the attribute, and keeping the step length; otherwise, the step size is changed into half of the current step size and the user tastes the current step sizeOptimizing by trial; if the current step length is smaller than the minimum step length, ending the optimization of the attribute, wherein the maximum step length is set to be 10, and the minimum step length is 0.1;