CN108022045B - Distribution estimation method - Google Patents

Distribution estimation method Download PDF

Info

Publication number
CN108022045B
CN108022045B CN201711250676.6A CN201711250676A CN108022045B CN 108022045 B CN108022045 B CN 108022045B CN 201711250676 A CN201711250676 A CN 201711250676A CN 108022045 B CN108022045 B CN 108022045B
Authority
CN
China
Prior art keywords
target
observation
scheme
population
satellite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711250676.6A
Other languages
Chinese (zh)
Other versions
CN108022045A (en
Inventor
张忠山
褚骁庚
陈英武
陈宇宁
吕济民
陈盈果
陈成
王涛
刘晓路
邢立宁
姚锋
贺仁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201711250676.6A priority Critical patent/CN108022045B/en
Publication of CN108022045A publication Critical patent/CN108022045A/en
Application granted granted Critical
Publication of CN108022045B publication Critical patent/CN108022045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Abstract

The invention discloses a distribution estimation method, which comprises the following steps: step1, when imaging the current target, the target recognition star extracts the forward-looking information provided by the target discovery star first; step2, deleting the target with low observation value in the look-ahead information according to three threshold parameters in the target filtering knowledge; step3, calling a plurality of heuristic algorithms in a scheduling algorithm set to generate a plurality of local observation schemes, wherein the Step can be performed in parallel, and the local observation schemes refer to observation schemes generated by the satellite in the current state only according to target information in a look-ahead time window; step4, comprehensively evaluating each local observation scheme according to evaluation parameters in scheme evaluation knowledge and a related evaluation method; step5, selecting the local observation scheme with the highest score and locking the first target in the scheme as the next observation target.

Description

Distribution estimation method
Technical Field
The invention relates to the technical field of satellites, in particular to a distribution estimation method.
Background
The conventional scheduling method of the agile satellite is to set a double-satellite cluster consisting of a low-resolution target discovery satellite (all called as a target discovery satellite) and an agile high-resolution target identification satellite (all called as a target identification satellite), wherein the target discovery satellite flies at the front end of the cluster, a low-resolution camera is used for push-scanning and imaging a large area on two sides of an off-satellite line, and related target information in the imaging area can be extracted in real time. The target identification satellite receives the target information extracted by the target discovery satellite in real time, and the position information, the imaging duration and the observation yield of the target are acquired 100s in advance. When the target recognition star images a certain target, the next target to be observed is quickly decided on the star according to the forward-looking information. The star cluster formed by the two satellites can effectively solve the target scene of sea surface target identification, and a plurality of targets on the sea surface can be identified by using one-time transit opportunity.
However, since only local information of the target distribution can be obtained at each decision of the target recognition star (only target information within the look-ahead time window is obtained, not task distribution information of the entire scene), a problem of "short sight" is easily caused at each decision of the next observation target. Especially when the scene period becomes long and the satellite's fixed storage and power constraints are tight constraints, these resource constraints may not support the target recognition satellite to image too many targets in one orbit (the high-resolution image occupies a large storage space). Such "short-look" decisions may cause the target-identifying satellites to prematurely consume resources on the persistent equivalent, possibly causing the satellites to have to abandon certain high-yield targets at a later stage due to insufficient resources on the persistent equivalent, thereby reducing the global observation yield of the constellation. In order to improve the target identification capability of the star cluster, how to effectively solve the problem of short sight decision of the target identification star becomes a key problem of star cluster operation management.
Disclosure of Invention
It is an object of the present invention to provide a distribution estimation method that overcomes or at least mitigates at least one of the above-mentioned disadvantages of the prior art.
In order to achieve the above object, the present invention provides a distribution estimation method, including:
step1, acquiring auxiliary decision knowledge information by the target identification satellite, and extracting the look-ahead information provided by the target discovery satellite, wherein the look-ahead information comprises target position information, imaging duration and observation income; the auxiliary decision knowledge information comprises target filtering knowledge and scheme evaluation knowledge;
step2, primarily evaluating the observation solid-state cost ratio and the observation electric quantity cost ratio of the target in the forward-looking time window according to a filtering threshold threPDR of the target imaging time length of the observation target, a filtering threshold threPro of the target observation income and a filtering threshold threPDR of the target observation income time length ratio in the target filtering knowledge so as to delete the target with lower observation value in the forward-looking information, wherein the target information mainly comprises target position information, imaging time length and observation income, and the use constraint of the satellite mainly comprises time window constraint, attitude maneuver constraint, solid-state constraint and electric quantity constraint;
step3, calling a plurality of heuristic algorithms in a scheduling algorithm set to generate a plurality of local observation schemes, wherein the local observation schemes can be calculated in parallel, the local observation schemes refer to observation schemes generated by a satellite in the current state only according to target information in a look-ahead time window, the observation schemes are described by sequence solutions, and one scheduling solution generator is utilized to translate one sequence solution into a feasible scheduling solution;
step4, carrying out comprehensive evaluation on each local observation scheme by using an evaluation function of the local observation scheme according to evaluation parameters in the scheme evaluation knowledge and a related evaluation method;
step5, selecting the local observation scheme with the highest score in Step4, and locking the first target in the scheme as the next observation target;
each parameter in the target filtering knowledge and the scheme evaluation knowledge is calculated by adopting a distribution estimation algorithm, wherein the distribution estimation algorithm specifically comprises the following steps:
step71, initializing a population, generating an initial population uniformly distributed in a value domain by using random sampling, and evaluating the adaptive value of each individual;
step72, dividing the niche, dividing the population into a plurality of sub-populations (niches) by adopting a K-means clustering algorithm based on Euclidean distance, wherein the number of the sub-total populations is a function of the iteration times, and the more the iteration times, the more the number of the sub-populations is;
step73, probability distribution estimation, wherein the individuals in each sub-population are subjected to preferential operation, and a dominant individual probability distribution model in each sub-population is established by using the superior individuals in each sub-population under the condition that all variables are mutually independent;
step74, sampling offspring, selecting a certain character population at a certain probability by the algorithm for sampling each time, sampling offspring by the selected neutron population according to the probability distribution model of the selected neutron population, and ending the Step until the number of newly sampled individuals of the algorithm is equal to the size of the current population;
step75, selecting individuals, combining parent individuals and child individuals in each child population, and obtaining a new generation population by adopting a near elite optimization strategy;
step76, local search, optimizing dominant individuals in the population with a certain probability by adopting local search algorithms such as a hill climbing method and the like, and further improving the quality of the solution;
and Step77, judging whether the termination condition of the algorithm is reached or not, if so, returning to the found optimal individual, otherwise, jumping to Step72.
The invention is further expanded on the basis of the application scene of the double-star cluster, and can be competent for finding, namely identifying, reconnaissance tasks of sea moving targets in a larger application scene (an observation orbit) by designing an on-star decision model. The decision model organically combines historical data, ground learning resources and on-satellite computing capacity by using a solution thought used on a ground learning satellite, so that the decision of short-term observation targets at each time can be completed under the condition of considering historical global information, and the use efficiency of a satellite cluster in a long period is improved. The decision-making capability of the target recognition star is improved by analyzing the historical data of the star cluster operation scene and extracting the auxiliary knowledge information of the decision-making on the star, so that the star cluster can obtain a better global observation benefit.
Drawings
Fig. 1 is a flowchart illustrating an embodiment of a scheduling method for agile satellites according to the present invention.
Fig. 2 is a sequence and schedule solution generator.
Fig. 3 is a ranking under different criteria of the target.
Fig. 4 is a schematic diagram of satellite progress ratios.
Fig. 5 is a flow chart of a distribution estimation algorithm.
FIG. 6 is a comparison of observed yields of decision models and online algorithms in a learning group.
FIG. 7 is a comparison of observed yields of decision models versus online algorithms in a test set.
FIG. 8 is an illustration of the impact of different mechanisms on a decision model.
Detailed Description
In the drawings, the same or similar reference numerals are used to denote the same or similar elements or elements having the same or similar functions. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The present invention assumes: the targets found in each track of the star cluster have a certain distribution rule. Different geographic regions have different target generation probabilities, and target distribution rules of different regions can be obtained through data accumulation, wherein the target distribution rules comprise target position information, imaging duration and observation income. Meanwhile, the orbit of the imaging satellite has a regression characteristic, namely after a certain running time, the orbit of the satellite returns to the original passing line again. Therefore, a large amount of information on the distribution of the target in each imaging orbit of the satellite can be acquired through long-term data accumulation. For each imaging orbit of the satellite cluster, knowledge information for assisting autonomous decision-making on the satellite can be extracted from a large amount of historical target distribution data to help the target recognition satellite to solve the problem of too short sight of each decision-making, and the satellite is prevented from consuming certain resources too early.
As shown in fig. 1, the distribution estimation method provided in this embodiment includes:
step1, after the target recognition star acquires the aid decision-making knowledge information, when imaging the current target each time, the target recognition star extracts the look-ahead information provided by the target discovery star, wherein the look-ahead information comprises target position information, imaging duration and observation income; the auxiliary decision knowledge information comprises target filtering knowledge and scheme evaluation knowledge;
step2, primarily evaluating the observation solid-state cost ratio and the observation electric quantity cost ratio of the target in the forward-looking time window according to a filtering threshold threPDR of the target imaging time length of the observation target, a filtering threshold threPro of the target observation income and a filtering threshold threPDR of the target observation income time length ratio in the target filtering knowledge so as to delete the target with lower observation value in the forward-looking information, wherein the target information mainly comprises target position information, imaging time length and observation income, and the use constraint of the satellite mainly comprises time window constraint, attitude maneuver constraint, solid-state constraint and electric quantity constraint;
step3, calling a plurality of heuristic algorithms in a scheduling algorithm set to generate a plurality of local observation schemes, wherein the local observation schemes can be calculated in parallel, the local observation schemes refer to observation schemes generated by a satellite in the current state only according to target information in a look-ahead time window, the observation schemes are described by sequence solutions, and one scheduling solution generator is utilized to translate one sequence solution into a feasible scheduling solution;
step4, comprehensively evaluating each local observation scheme according to evaluation parameters in the scheme evaluation knowledge and a related evaluation method;
and Step5, selecting the local observation scheme with the highest score in Step4, and locking the first target in the scheme as the next observation target.
In Step2, in order to improve the decision efficiency, the target recognition star may use the target filtering knowledge provided by the ground to perform preliminary screening, also called target filtering, on the targets within the look-ahead time window. In order to improve the accuracy of target filtering and avoid mistakenly deleting valuable targets, relevant attributes of target information and satellite scheduling constraints are briefly analyzed.
The target information mainly comprises target position information, imaging duration and observation income. The use constraints of the satellite are mainly time window constraints, attitude maneuver constraints, solid memory constraints and electric quantity constraints. The time sequence constraint composed of the time window constraint and the attitude maneuver constraint is difficult to judge the quality of the target through simple mathematical calculation, so the target filtering knowledge mainly selects the target to be deleted by analyzing the solid storage of the target and the electricity resource consumption cost performance.
When the target identification satellite images a target, the consumption of the satellite on the survival of a certain observed target is in direct proportion to the imaging duration of the target, specifically duriCr, wherein duriThe imaging duration of the target, cr, is the image acquisition code rate of the satellite. Since cr is a constant, p can be usedi/duri(piIs an observation gain of a task) represents an observation solidity price ratio of the observation target i. For the consumption of electric quantity, the satellite needs to consume electric quantity when observing the target, and certain electric quantity is consumed in the attitude maneuver process. However, since the attitude maneuver before and after a satellite observes a certain target depends on the relative attitude difference between the target and the previous and subsequent observed targets, the calculation difficulty is high. In order to improve the efficiency of the satellite on-satellite target filtering operation, only the electric quantity consumed when the target is imaged is used as a reference index for judging whether the target is filtered or not. The amount of power consumed by the satellite in imaging the target is duriPc, where pc is the amount of power consumed by the satellite per imaging time. Similarly, since pc is a constant, p can also be usedi/duri(piFor the target observation gain) represents the observation power cost performance of the observation target i.
In summary, in the object filtering knowledge, only p of the observed object is utilizedi,duri,pi/duriThe three parameters can carry out preliminary evaluation on the observation solid-storage cost ratio and the observation electric quantity cost ratio of the target i, namely, the target with lower observation resource cost ratio is deleted, so that the method can improve the quality of the targetThe decision making efficiency on the satellite is improved, and the problem of decision making short sight is solved to a certain extent. Therefore, in the ground decision knowledge learning process, only p is processedi、duri、pi/duriThe filtering thresholds threPro, threDur and threPDR of the three parameters are learned, so that the satellite filters out observation targets which cannot meet the thresholds of the three parameter values simultaneously during decision making (if one of the three attributes of the target is smaller than the threshold, the target is deleted), and the extraction of the target filtering knowledge can be completed.
In Step3, in order to fully utilize the computing resources of the satellite (multi-core CPU, the dominant frequency of each single core is about 80 MHz), the design idea of the on-satellite task scheduling algorithm set is to design a plurality of simple heuristic algorithms capable of running in parallel. The CPU of the satellite can perform parallel computation, but the computing capacity of each single core is limited, so the complexity of the algorithm required for each single core to run cannot be too high. Meanwhile, the scheduling strategy adopted by each heuristic algorithm has different emphasis, so that various scheduling scheme sets can be provided, and a satellite can conveniently select a proper observation scheme according to the self state during decision making.
In the satellite scheduling problem, the satellite mainly researches an algorithm in three aspects of selecting a target subset, calculating target observation time and attitude and adjusting the target subset in the heuristic algorithm design process.
Before the specific flow of the heuristic algorithm is introduced, the expression mode of the solution adopted by the invention is briefly introduced. The present invention describes an observation scheme by means of sequence (also called "target observation sequence"), and uses a scheduling solution generator to translate a sequence solution into a feasible scheduling solution. And the scheduling solution generator translates the sequence solution into a scheduling solution by adopting a construction method based on a greedy rule.
A sequence solution consisting of the target sequence is denoted by ps, and the target contained in the solution is a subset of the target corpus. The solution space of the sequence is represented by D (ps), the solution space of the scheduling solution is represented by D (ss), and the scheduling solution generator based on the greedy rule is represented by SB. For any sequence in D (ps), the scheduling solution generator SB can be used to generate its corresponding scheduling solution in D (ss). And generating a scheduling solution, arranging the targets according to the sequence of the targets in ps, and observing the corresponding targets as early as possible by adopting a method of arrangement immediately before and under the condition of meeting constraint conditions. If some targets cannot be observed due to time window or attitude maneuver constraints, the target is directly discarded and the next target is scheduled. Therefore, the scheduling solution generator SB can generate a feasible scheduling scheme that satisfies all constraints. The use efficiency of the satellite in the time dimension can be improved by the arrangement in the near future, and the satellite can reserve more time to carry out attitude maneuver so as to image the subsequent target. As shown in fig. 2, the sequence solution ps (target observation sequence) of the satellite is 3 → 7 → 9 → 1 → 6 → 8 → 4, and the scheduling solution ss after the conversion by the scheduling solution generator SB is shown in the lower half of fig. 2. The satellite cannot observe the target 1 and the target 8 due to the limitation of attitude maneuver constraints, so the two targets are discarded when the scheduling solution is generated, and the subsequent targets are directly tried to be observed.
For the sake of understanding, the target observation sequence is the scheduling solution described in the above paragraph. In the process of selecting the target subset by using a heuristic algorithm, firstly selecting a target according to a certain index, then inserting the selected target into the existing target observation sequence according to a time ascending order to obtain a new target observation sequence, and then performing scheduling solution conversion on the target observation sequence by using a scheduling solution generator SB. If the gain of the scheduling scheme can be increased, a new observation target sequence is adopted, otherwise, a newly inserted target is abandoned, and the original target observation sequence and the corresponding scheduling solution are reserved. Therefore, the target selection indexes can guide the search direction of the heuristic algorithm, namely, the diversity of the target selection indexes can enable the heuristic method set to provide various scheduling schemes.
The target selection index mainly has a time sequence index, a target observation income index, a target imaging duration (imaging duration) index and an income duration ratio index.
The time series index is to arrange the targets in ascending order according to the starting time of the time window and select the target with the earliest starting time in turn. The heuristic algorithm based on the selected index is equivalent to scheduling the target set by adopting a First Come First Served (FCFS) strategy. Namely, a sequence solution corresponding to a heuristic algorithm is the time ascending order of the targets, and then the modulation solution generator tries to arrange the observation time of each target at the earliest observable time under the condition that the timing constraint (time window constraint and attitude maneuver constraint) is met.
The target observation income indexes are that targets are arranged according to the descending order of observation income, and the targets with the highest observation income are sequentially selected. Because the optimization goal of satellite scheduling is to maximize the global gain of the observation scheme, the gain of the scheme can be rapidly increased when the local scheme is constructed by selecting the goals in a gain descending manner, but the gain of the global scheme cannot be guaranteed to be higher.
The target imaging duration (imaging duration) index is to arrange the targets in ascending order of the imaging durations (imaging durations) and sequentially select the target with the minimum imaging duration (imaging duration). From the foregoing analysis, it can be seen that the amount of resources consumed by the target, such as the amount of power consumed by the target, is proportional to the imaging duration of the target, so that the selected target indicator enables the satellite to observe more targets when the satellite resource constraint is less stringent.
The profit-to-time ratio index refers to the objective in terms of the observed profit-to-time ratio (p)i/duri,piFor the observed gain of the task, duriImaging duration for the task) in descending order, the target with the highest observed revenue-to-duration ratio is selected in turn. From the foregoing analysis, it can be known that the solid and electric resources consumed by the target are proportional to the imaging duration thereof, so that the index can be used for evaluating the cost performance of resource consumption for observing a certain target.
The specific method for calculating an available observation scheme by using a scheduling solution generator is as follows, the profit-to-duration ratio of the targets is set to be arranged in a descending order of the profit-to-duration ratio of the targets, namely, a first target, a second target, a third target, a fourth target, a fifth target, a sixth target and a seventh target, for example, as shown in fig. 3 and table 4, the task ordering condition of the current target under different indexes is shown. The heuristic algorithm first selects a certain index and then calculates an available scheduling scheme through the scheduling solution generator SB according to the construction method described above. Taking the profit-to-time ratio index as an example, the profit-to-time ratio of the targets is arranged in a descending order of 4-6-3-1-8-7-9. First, according to the attitude of the current satellite and the resource usage state, whether the target 4 can be observed is considered, and if so, the current best sequence (hereinafter referred to as "current best solution", cblan) is updated to {4 }. Secondly, on the basis of the current best solution, adding the target 6 to the current best solution, arranging the targets according to the ascending order of time, namely considering the scheduling solution converted by the sequence 6 → 4, and if the benefit of the scheduling solution is better than that of the original current best solution, updating the current best solution to be {6, 4 }. And then, analogizing in sequence, after adding a target on the basis of the current best solution each time, arranging the tasks according to an ascending order to obtain a new sequence solution, converting the new sequence solution into a scheduling solution by using a scheduling solution generator, and replacing the original current best solution if the income of the new scheduling solution is higher, or keeping the original current best solution. The pseudo code of the heuristic algorithm is shown as algorithm 5. Where numT represents the number of targets within the look-ahead time window; psC, the sequence solution obtained after adding the target in the current optimal scheme each time; ssC is the scheduling solution corresponding to the current sequence solution psC; proC is the benefit of the current scheduling scheme.
TABLE 4 target parameters
Figure GDA0002947055260000081
To further enrich the diversity of the set of heuristics, it may be considered that the algorithm designs another set of heuristics that operate only on the top 1/2 targets in the sorted list, and directly abandon the rear targets in the list (if sorted into target 1, target 2, target 3, target 4, only target 1 and target 2 are considered). Such an algorithm can only make scheduling attempts on the most valuable targets when satellite resources are in short supply, thereby saving satellite resources as a whole. Although the scheduling scheme generated by the heuristic algorithm may not perform well on the observation income, the scheduling scheme has more excellent performance in the aspects of indexes such as observation time, scheme income-solid ratio (scheme income/scheme consumption solid quantity) and the like. The heuristic algorithm set can provide a more diversified observation scheme set, and the satellite can select the most appropriate observation scheme more conveniently in the scheme evaluation stage.
In Step4, the observation scheme evaluation module is to make the satellite perform comprehensive evaluation on each scheme according to the gains, solid consumption, power consumption and other indexes of each local observation scheme generated by the heuristic algorithm set when the satellite decides the next imaging target each time, so as to select a local scheduling scheme that can improve the global observation gains (the sum of the gains of the targets observed by the satellite in one orbit) more probably.
In the observation scheme evaluation module, the satellite evaluates five attributes of the observation scheme, which are respectively: scheme profit proP, solid reserve consumption sdP, profit-solid reserve ratio psdR, total electricity consumption egP, profit-electricity ratio pegR, execution duration edP, and profit-electricity ratio pedR.
The plan profit proP refers to the sum of target observation profits of all imaging targets in the local observation plan. The index represents the overall gain of the local scheme, and when the resources such as the fixed power of the satellite are sufficient, the global observation gain of the satellite is more likely to be increased by the higher local observation scheme gain.
The consolidation consumption sdP refers to the amount of satellite consolidation resources consumed by the local observation scheme. Since the solid consumption of each target during imaging is proportional to the imaging duration of the target, the solid consumption index of the local observation scheme can be calculated by summing the imaging durations of all targets in the local observation scheme and then multiplying the sum by the image acquisition code rate cr during satellite imaging. The index is used for describing the consumption condition of the local observation scheme on the satellite solid resources, and when the satellite solid resources are insufficient, the local observation scheme with less solid consumption is preferentially selected.
The benefit-to-retention ratio psdR is the benefit of the local observation scheme divided by the retention consumption of the scheme. The index is used for describing the use efficiency of the satellite solid storage resources in the local observation scheme, and if the solid storage constraint of the satellite is tight constraint, the global observation income of the satellite can be increased with higher probability than the income solid storage of the local observation scheme by emphasizing the increase of the income solid storage of the local observation scheme during each decision.
The total power consumption egP is the power consumption of the local observation scheme, which mainly includes the imaging power consumption and the attitude maneuver power consumption. The index is used for describing the consumption condition of the local observation scheme on the satellite electric quantity resource, and the local observation scheme with less electric quantity consumption is preferentially selected when the satellite electric quantity resource is insufficient.
The profit-to-electric-quantity ratio pegR is the scheme profit of the local observation scheme divided by the total electric quantity consumption of the scheme. The index is used for describing the use efficiency of satellite electric quantity resources in the local observation scheme, and if the electric quantity constraint of the satellite is tight constraint, the global observation yield of the satellite can be increased with a higher probability by emphasizing the improvement of the yield electric quantity ratio of the local observation scheme in each decision.
The execution duration edP refers to the total flight duration of the satellite consumed by the local observation scheme, and mainly includes the imaging duration and the attitude maneuver duration. The index is used for describing the consumption situation of the local observation scheme on the satellite execution time, and when the satellite imaging orbit is close to the end, the local observation scheme with less execution time consumption is preferentially selected.
The profit-to-time ratio, pedR, refers to the solution profit of the local observation solution divided by the execution time of the solution. The index is used for describing the time use efficiency of the satellite in the local observation scheme, and if a large amount of resources are left when the imaging orbit of the satellite is close to the end, the global observation yield of the satellite can be increased with a higher probability than the energy consumption rate when the yield of the local observation scheme is increased in each decision.
In summary, the evaluation method of the local observation scheme is shown in the following formula, wherein w1To w7The weight coefficients corresponding to the scheme profit proP, the solid-state consumption sdP, the profit-solid-state ratio psdR, the total electricity consumption egP, the profit-electricity ratio pegR, the execution duration edP and the profit-electricity ratio pedR are also the knowledge information to be learned by the ground learning module. Each weight has a value range of-100, 100]The weighting factor may take a negative value because a smaller index (e.g., the solid inventory consumption sdP) indicates a higher solution composite score.
cScore=w1·proP+w2·sdP+w3·psdR+w4·egP+w5·pegR+w6·edP+w7·pedR
In order to further improve the decision scientificity of the satellite, the consumption conditions of various resources of the satellite at different stages in an imaging orbit can be obtained by carrying out statistical analysis on the use conditions of the satellite in historical scenes (each historical scene is an imaging circle of the satellite). The statistical data can be used for the satellite to decide which attributes of the local observation scheme should be optimized under the current situation through comparison with historical state data. For example, if the current satellite has executed a 10min observation scenario in a 40min imaging orbit, 240 observation gains have been obtained, with 33% of the solid resources consumed and 27% of the power resources consumed. However, according to the statistical analysis of the historical scenario, in the use case of the satellite with higher global observation yield, the satellite obtains about 220% of observation yield and consumes 27% of the fixed storage and 28% of the electric quantity when the satellite completes 25% (10/40) observation circles. Therefore, the current satellite can pay more attention to the profit-to-solid ratio (observation profit/solid) of the observation target in the subsequent decision making of the imaging orbit, namely, the corresponding weight of the evaluation index of the scheme profit-to-solid ratio (scheme profit/scheme solid consumption) can be properly increased when the local observation scheme is evaluated, so that the local observation scheme with the higher scheme profit-to-solid ratio can be selected in each decision making.
To describe the amount of resources such as the amount of power and the amount of power that the satellite should consume when completing the percentage of different observation orbit turns and the gain obtained by observing the target, a concept of the progress ratio is introduced, where the progress ratio refers to the percentage of the satellite completing the current observation orbit turn, for example, the total time of the current imaging monorail is 40min, and the progress ratio of the satellite is 25% ((10/40) × 100%) when the satellite has performed the observation scheme of 10 min.
The invention designs a statistical method for acquiring the solid storage and electric quantity resources correspondingly consumed by the satellite at different progress ratios and the observation income expected to be acquired. The method assumes that the consumption situation of observation resources such as satellite solid storage, electric quantity and the like and the obtained observation benefits are consistent with the distribution situation of targets. For a certain progress ratio pr in a certain historical scene, the target set of the scene is divided into ts1 and ts2 by the sub-satellite point corresponding to the moment, ts1 is a target set of which the midpoint of the visual time window of the target is smaller than or equal to the moment corresponding to the progress ratio pr, and ts2 is a target set of which the midpoint of the visual time window of the target is larger than the moment corresponding to the progress ratio pr (target set which does not pass the top yet). Within the imaging circle, the observation gain that the progress ratio pr should obtain is the sum of the observation gains of the targets in ts1, i.e., exPro — ProS (ts1), where exPro represents the expected observation gain of the progress ratio, and ProS (…) represents the sum of the observation gains of all targets in a certain target set. Since the solid and electric power consumption of a certain target is observed to be proportional to the imaging time length of the target, the percentage of solid and electric power resources consumed by the progress ratio pr in the imaging circle is exRCR ═ DurS (ts1)/(DurS (ts1) + DurS (ts1)), where exRCR is the percentage of resources expected to be consumed at the progress ratio, and DurS (…) represents the sum of the imaging time lengths of all targets in a certain target set.
As shown in fig. 4, the current progress ratio of the satellite corresponds to an off-satellite point a, the point a divides the target corpus into two subsets ts1 and ts2, where ts1 is {3, 7}, ts2 is {9, 1, 6, 8, 4}, so that the progress ratio in this usage scenario is expected to obtain an observation yield of exPro — ProS (ts1) ═ 95, and the percentage of consumed solid and electric resources is equal to
Figure GDA0002947055260000111
(parameters of the target are shown in Table 1).
Table 1 target parameter examples
Figure GDA0002947055260000112
After the observation gain and the percentage of solid storage and electric quantity resource consumption corresponding to a certain imaging circle of the progress ratio pr are obtained, the data of the progress ratio pr in all the historical imaging circles are subjected to statistical analysis (assuming that the indexes of the progress ratio obey positive distribution), and the corresponding mean value and standard deviation are solved, so that the data can be used as the basis for judging the satellite state at the decision moment by the satellite. Taking the expected observation yield of the progress ratio pr as an example:
Figure GDA0002947055260000113
Figure GDA0002947055260000114
wherein numSce represents the number of scenes;
Figure GDA0002947055260000115
indicating the expected observation gain of progress ratio pr in the ith scene;
meProprrepresents the observed revenue expected by the progress ratio pr on average;
sdProprindicating the standard deviation of the observed revenue expected for progress vs. pr under the assumption of positive distribution.
Average expected observed revenue mePro corresponding to pr at known scheduleprAnd standard deviation of observed yield sdProprThen, the relative relationship between the current observation yield and the expected observation yield of the satellite is expressed by rePro in the following expression (29). Wherein, curProprIndicating that the current progress of the satellite is more than the observed revenue that has been obtained under pr. A rePro of zero indicates that the current observed yield is consistent with the expectation, and a value greater than zero indicates that the current observed yield exceeds the expectation. Less than zero indicates that the current observed revenue is not as good as the expected observed revenue. The observation gain weight adjustment coefficients acPro and rePro are in an exponential relationship, as shown in equations (29) and (30), where aptro and bPro are correlation parameters and are parameters to be learned by the ground learning module, and the range of aptro is a negative real number
Figure GDA0002947055260000125
The range of bPro is the total real number
Figure GDA0002947055260000126
According to the formula, when the rePro is increased, the acPro is reduced, and the gain of the local observation scheme is not emphasized too much when the satellite decision is made; however, when the rePro is smaller, the acPro becomes larger, which indicates that the weight corresponding to the observation gain in the local observation scheme at the time of scheme evaluation should be appropriately increased when the satellite decision is made.
Figure GDA0002947055260000121
acPro=e(aPro·rePro+bPro) (30)
w1·acPro·proP (31)
The mean value meRCR of the percentage of solid storage and electric quantity consumption corresponding to the schedule ratio pr can be obtained in the same wayprAnd standard deviation sdRCRprAs shown in equations (31) and (32).
Figure GDA0002947055260000122
Figure GDA0002947055260000123
Wherein the content of the first and second substances,
Figure GDA0002947055260000124
representing expected solid storage and electric quantity resource consumption corresponding to the progress ratio pr in the ith scene;
meRCRprthe average expected fixed storage and electricity resource consumption of the progress rate pr is represented;
sdRCRprindicating the standard deviation of schedule versus pr expected reserve, power resource consumption under the assumption of being too distributed.
The relative relationship redr between the current and expected consumption of the satellite's resources is shown in equation (33). Wherein, curSDRprRepresenting the current percent of inventory consumed.
Figure GDA0002947055260000131
The relative relation coefficients of the electric quantity resources, namely the REEGR and the curEGR can be obtained in the same wayprIndicating current power consumptionPercent consumption.
Figure GDA0002947055260000132
The adjusted evaluation function of the protocol was:
cScore=w1·acPro·proP+w2·acSD·sdP+w3·acPsdR·psdR+w4·acEG·egP
+w5·acPegR·pegR+w6·acED·edP+w7·acPedR·pedR (35)
wherein the content of the first and second substances,
acPro=e(aPro·rePro+bPro) (36)
acSD=e(aSD·reSDR+bSD) (37)
acPsdR=e(aPsdR·reSDR+bPsdR) (38)
acEG=e(aEG·reEGR+bEG) (39)
acPegR=e(aPegR·reEGR+bPegR) (40)
acED=e(aED·reRCR+bED) (41)
acPedR=e(aPedR·reRCR+bPedR) (42)
in the evaluation formula, except for the observation scheme gain index, other indexes are all related to a certain resource of the satellite (fixed storage, electric quantity and execution time), so that other adjustment coefficients all represent the relative relation between the current satellite state and the historical statistical state by using the rerCR. However, it should be noted that although the later adjustment coefficients use reRCR to represent relative relationships, the correlation parameters of the reRCR in each adjustment coefficient are different, that is, different correlation parameters are trained in the ground learning module, which allows the adjustment coefficients to maintain mutually independent characteristics to some extent.
As can be seen from the table, in the observation protocol evaluation module, there are 17 parameters (last 17 rows in Table 2) that need to be trained by the ground learning module, which are aPro, bPro, aSD, bSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, aED, bED, aPedR, bPedR, w1,w2,w3,w4,w5,w6,w7
TABLE 2 this section of relevant parameter information
Figure GDA0002947055260000141
Figure GDA0002947055260000151
Figure GDA0002947055260000161
Figure GDA0002947055260000171
The ground parameter learning method in Table 6 is given in detail below
The ground knowledge learning module improves the satellite on-line decision scheduling capability by learning related parameters in the decision-making auxiliary knowledge according to the historical scene information of the later satellite, so that the satellite has higher probability to obtain higher global benefits in a new scene.
According to the above analysis, the parameters to be trained by the ground learning module are 3 threshold parameters (threPro, threDur, threPDR) in the target filter module, and 17 parameters (aPro, bPro, aSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, w1,w2,w3,W4,W5,w6,w7). Here, the set of filtering threshold parameters tarFilter represents the 3 threshold parameters used in the target filtering module, and the set of scheme evaluation parameters schEval represents the 17 evaluation parameters used in the scheme evaluation module.
By learning the parameters, the filtering precision of the targets during autonomous mission planning on the satellite can be improved, so that the satellite applies computing resources to the more valuable targets during decision making, and the decision making efficiency and quality are improved; meanwhile, the global viewing capability of the satellite in the process of evaluating a plurality of local observation schemes can be improved, and the evaluation parameters can be adaptively adjusted according to the state of the satellite, so that the problem of short decision-making vision is solved, and the global observation yield of the satellite is improved.
Under the condition that the target global distribution has a certain rule, the ground learning module adjusts the parameters to enable the satellite to obtain a higher global benefit when the satellite uses the parameters for online scheduling, namely, the satellite uses the parameters to obtain a better global benefit when the satellite performs online scheduling in a historical scene.
As can be seen from the foregoing description, each time the target recognition star images the current target, look-ahead information (including target position information, imaging duration, and observation gain) of the target is extracted first; deleting a target with lower observation value in the look-ahead information according to three threshold parameters in the target filtering knowledge tarFilter; then, calling a plurality of heuristic algorithms in the scheduling algorithm set to generate a plurality of local observation schemes; then, carrying out comprehensive evaluation on each local observation scheme according to 21 evaluation parameters of schEval in scheme evaluation knowledge and a relevant evaluation method; and finally, selecting the local observation scheme with the highest score and locking the first target in the scheme as the next observation target. Therefore, the global observation gain of the target recognition star in a scene can be regarded as a function of the tarFilter threshold parameter set and the schEval scheme evaluation parameter set. Thus, the process of parameter learning can be considered as a process of parameter optimization, with the goal of maximizing the sum of the global benefits of the target-identified stars in all scenarios, i.e.
Figure GDA0002947055260000181
Wherein numSce is the number of historical scenes;
talFilter is a set of filtering threshold parameters (comprising 3 threshold parameters);
schEva is a scheme evaluation parameter set (comprising 17 evaluation parameters);
globalProi(… ) is the global observed yield in the ith scene under the usage trait parameter.
The value range of the required parameters is shown in table 3 for the ground learning parameters.
TABLE 3 ground learning parameters
Figure GDA0002947055260000182
Figure GDA0002947055260000191
According to the analysis and description in the previous section, the learning problem of the parameters is converted into an optimization problem, and the objective function values of different parameters need to be calculated through a scene simulator. Because of the many parameters that need to be optimized, and the problem may have a plurality of local peaks (local optimal solutions), and the characteristics of the problem result in that each calculation of the objective function value for a parameter will consume a large calculation time. Therefore, the invention selects a distribution estimation algorithm combined with the niche strategy to optimize the relevant parameters.
The distribution estimation algorithm (EDA) is a new type of evolutionary computing algorithm. Different from the traditional evolution algorithm, the method does not have operations such as crossing or mutation and the like, and optimizes the objective function through a population evolution strategy based on probability distribution. Because the EDA algorithm can evaluate the evolution information of the population from a macroscopic perspective, the EDA algorithm generally has better global property and diversity and is not easy to fall into a local optimal solution for a long time to cause the phenomenon of premature convergence. The EDA algorithm firstly estimates the distribution condition of the dominant individuals in the population, then establishes a probability model of the dominant individuals and then obtains the offspring individuals in a sampling mode.
The Niche (Niche) is a concept from biology, and refers to a living environment in a specific environment, and in the process of evolution, organisms generally live together with the same species and multiply offspring together; they also all live in a particular geographic area. The basic idea of niche strategy (Niching) is to apply niche concepts in biology to evolutionary computation, divide each generation of individuals in the evolutionary computation into a plurality of classes, and select a plurality of individuals with high fitness in each class as excellent representatives of one class to form a group. According to the invention, a niche strategy based on adjacent clustering is adopted to search each individual in own neighborhood so as to achieve the purpose of increasing population diversity.
The distribution estimation algorithm is used for solving specific values of the parameters according to historical scenes, and as shown in fig. 5, an algorithm framework is composed of the following 7 parts:
step71, initializing a population, generating an initial population uniformly distributed in a value domain by utilizing random sampling, and evaluating an adaptive value of each individual;
and step72, niche division, namely dividing the population into a plurality of sub-populations (niches) by adopting a K-means clustering algorithm based on Euclidean distance. The number of the sub-total groups is a function of the iteration times, and the larger the iteration times, the larger the number of the sub-groups;
step73, probability distribution estimation, wherein the individuals in each sub-population are subjected to preferential operation, and a dominant individual probability distribution model in each sub-population is established by using the superior individuals in each sub-population under the condition that all variables are mutually independent;
step74, child sampling, wherein the algorithm selects a certain character population at a certain probability each time for sampling operation, and the selected neutron population samples child according to the probability distribution model of the selected neutron population, and the step is finished until the number of newly sampled individuals of the algorithm is equal to the size of the current population;
step75, individual selection, each sub-population combines parent individuals and child individuals, and a near elite optimization strategy is adopted to obtain a new generation population;
step76, local search, wherein dominant individuals in the population are optimized with a certain probability by adopting local search algorithms such as a hill climbing method and the like, so that the quality of the solution is further improved;
and Step77, judging whether the termination condition of the algorithm is reached or not, returning to the found optimal individual if the termination condition of the algorithm is reached, and otherwise, jumping to Step72 and repeating steps 72-76.
In the algorithm framework, Step72 introduces a niche strategy based on adjacent clustering, and the population is divided into a plurality of sub-populations by using a Euclidean distance-based K-means algorithm. The number of the sub-populations is adaptively adjusted according to the optimized algebra, and when the optimized algebra is less, the number of the sub-populations is less, so that the exploration capability (exploration) of the algorithm is better enhanced; when the optimized algebra is more, the number of the sub-populations is increased, and the optimization development capability (optimization) of the algorithm is enhanced at the later stage of the algorithm. The sub population number mapping function adopted by the algorithm is shown in a formula (44), wherein numIter represents an optimized algebra; numPop represents the number of sub-populations; round () represents a round integer operation.
Figure GDA0002947055260000211
In Step73, the algorithm first selects the top 10% of dominant individuals in the whole population, and then establishes a Gaussian distribution probability model in each sub-population according to the dominant individuals. Taking the sub-population i as an example, the mean and variance of the fitted positive distribution model are shown in equations (45), (46). Wherein, numExciThe number of dominant individuals of the ith sub-population;
Figure GDA0002947055260000215
representing the j dominant individual in the ith sub-population; mu.siRepresenting the mean value of the dominant individuals in the ith sub-population; sigmaiThe standard deviation of the dominant individual in the ith sub-population is shown.
Figure GDA0002947055260000212
Figure GDA0002947055260000213
At Step74, the algorithm determines a sub-population to be sampled by roulette for each individual in the new generation, and generates a new individual by sampling according to the corresponding probability distribution model of the dominant individual. The probability of each sub-population being selected is shown as a formula (47), namely, the sum of the adaptive values of the dominant individuals in the sub-population accounts for the percentage of the sum of the adaptive values of the dominant individuals in the whole population. Wherein, selectPiRepresents the probability that the ith sub-population is selected, and Fit () represents the fitness value of an individual. Because each sub-population uses the mutually independent sampling model, the new generation of individuals can be distributed in different areas of the solution space, and the diversity of the individuals in the whole population is protected.
Figure GDA0002947055260000214
In Step75, the algorithm adopts a near elite optimization strategy to merge parents and children individuals, so as to obtain a new generation of population. Specifically, for each parent individual, the child individual closest to the parent individual (euclidean distance) is found, and the fitness values of the two are compared. If the adaptation value of the child individual is higher than that of the parent individual, replacing the corresponding parent individual with the child individual, and deleting the individual from the child individual list; otherwise, the number of times of competition failure of the descendant is added with 1 (the initial value is 0), and the descendant is deleted when the number of times of competition failure of the descendant is equal to 3.
In Step76, the dominant individuals in each population are further optimized using the hill climbing method. The local search probability for the dominant individual is shown in equation (48), where searchPjRepresenting an individual xjProbability of performing a local search operation, FitMThe fitness value of the optimal individual of the whole population is represented. When optimizing a certain attribute of an individual, the hill climbing method fixes all other attributes and optimizes a single attribute in a variable step length mode. And if the hill climbing method optimizes all the attributes one by one, finishing the algorithm if the individual adaptive value is not improved, otherwise, starting optimization from the first attribute. The strategy of changing the step length is to try the maximum step length first when optimizing the attribute each time, if the current step length isIf the step length can optimize the attribute, replacing the original attribute value with the new attribute value, and keeping the step length; otherwise, changing the step size to be half of the current step size and trying to optimize; and if the current step size is smaller than the minimum step size, ending the optimization of the attribute. The maximum step size is set to 10 and the minimum step size is 0.1.
Figure GDA0002947055260000221
In Step77, the algorithm determines whether the termination condition is reached according to the fact that the elements in the dominant individual set of the population have not been changed for 2 consecutive generations, that is, no new dominant individual is generated for 2 consecutive generations, and the algorithm is terminated and the found optimal individual is returned.
The following is an experimental analysis using the method provided by the present invention.
Different from a traditional artificial point-taking mode of experimental design, the method directly utilizes an algorithm to generate a visual time window and an observation side-sway angle between the satellite and the target in the experimental design, and the design can ensure that the target has stable probability distribution, can better verify the effectiveness of a solving idea and has higher universality value. The orbital height of the high-resolution imaging satellite is about 500km mostly, and generally, one satellite orbit of the satellite with the orbital height is about 90 min. Due to the limitation of imaging conditions such as illumination and cloud and fog, the time available for imaging in one orbit of the satellite is about 31min (1860 s). Therefore, in the experimental design, the time of each scene is about 1900s, i.e. the visible time window of all tasks is between 0s and 1900 s.
In a simulation verification scene, the orbital height of a satellite is 500km, the initial solid memory is 600Gb, and the imaging solid memory write code rate is 3 Gb/s. In the aspect of mobility, the maximum speed of the satellite attitude maneuver is 1 degree/s, and the acceleration of the satellite attitude maneuver is 0.5 degree/s2The acceleration at deceleration is 0.25 DEG/s2. The stable time is 5s, and the range of the yaw angle is [ -30 DEG, 30 DEG ]]The maximum compound inclination angle is 40 degrees. The number of conventional observation targets of each satellite follows a Gaussian distribution with a mean value of 48 and a standard deviation of 2, and each satellite is to be observedThe distribution of the measured objects is composed of a uniform area distribution and two key area distributions, and the specific gravity of the three is 4: 2: 4. the look-ahead time length of the satellite is 90s, and the target information can be acquired 90s ahead of the over-the-top time. In the aspect of electric quantity, the initial electric quantity of the satellite is 5kwh, the power consumption rate of unit time during imaging is 3kw, the power of the attitude maneuver during acceleration motion is 15kw, the power of the attitude maneuver during deceleration motion is 10kw, and the power of the attitude maneuver during uniform motion is 3 kw.
The observation target consists of a uniform area, an important area 1 and an important area 2, and accounts for 40%, 20% and 40% of the conventional targets of each satellite respectively. The combination mode represents a more complex target distribution situation, and the performance of the algorithm can be better tested.
The relevant parameters for the uniform distribution are shown in table 4, where the objects consist of point objects and band objects, where the band objects account for 50% of the objects in the area. The overhead time of the target in the area is subjected to uniform distribution from 30s to 1860s after the scene starts; observing that the side swing angle follows the uniform distribution of minus 30 degrees to 30 degrees; the imaging time of the point target is still 5s, the imaging yield obeys Gaussian distribution with the mean value of 40 and the standard deviation of 10; the imaging duration of the strip target follows Gaussian distribution with the mean value of 15s and the standard deviation of 3 s; the imaging yield follows a gaussian distribution with a mean value of 60 and a standard deviation of 10.
TABLE 4 Uniform distribution of target related parameters
Figure GDA0002947055260000231
The relevant parameters of the distribution of the key area 1 are shown in table 5, the target is mainly a point target, the overhead time of the satellite is subjected to uniform distribution from 30s to 385s after the scene starts, the imaging time of the target is 5s, the imaging time and the observation side-sway angle of the satellite are subjected to uniform distribution from-26 degrees to 26 degrees, the imaging yield is subjected to Gaussian distribution with the mean value of 30 and the standard deviation of 10.
TABLE 5 Objective distribution-related parameters for region of interest 1
Figure GDA0002947055260000241
The relevant parameters of the distribution of the key area 2 are shown in table 6, the target is also a point target, the satellite over-top time obeys gaussian distribution with the mean value of 900s and the standard deviation of 300 s; the observation side sway angle of the satellite follows Gaussian distribution with the mean value of-20 degrees and the standard deviation of 5 degrees; the imaging time of the target is 5s, the target observation yield obeys Gaussian distribution with the mean value of 40 and the standard deviation of 5. The targets in the region are relatively uniformly distributed and more concentrated, and certain challenges are provided for processing scheduling algorithm timing constraints.
TABLE 6 distribution-related parameters of objects in region of emphasis 2
Figure GDA0002947055260000242
Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A distribution estimation method is characterized in that a double-star cluster comprises a target discovery star and a target identification star, and the distribution estimation method comprises the following steps:
step1, acquiring auxiliary decision knowledge information by the target identification satellite, and extracting the look-ahead information provided by the target discovery satellite, wherein the look-ahead information comprises target position information, imaging duration and observation income; the auxiliary decision knowledge information comprises target filtering knowledge and scheme evaluation knowledge;
step2, according to a filtering threshold threPDR of target imaging time length of an observation target, a filtering threshold threPro of target observation income and a filtering threshold threPDR of target observation income time length ratio in target filtering knowledge, preliminarily evaluating the observation solid-state cost ratio and the observation electric quantity cost ratio of the target in a forward-looking time window to delete the target with lower observation value in forward-looking information, wherein the target information comprises target position information, imaging time length and observation income, and the use constraint of a satellite is time-constrained, attitude maneuver constraint, solid-state constraint and electric quantity constraint;
step3, calling a plurality of heuristic algorithms in the scheduling algorithm set to generate a plurality of local observation schemes, wherein the local observation schemes refer to observation schemes generated by the satellite in the current state only according to target information in a look-ahead time window, and the observation schemes are described by sequence solutions, and a scheduling solution generator is utilized to translate the sequence solutions into a feasible scheduling solution;
step4, carrying out comprehensive evaluation on each local observation scheme by using an evaluation function of the local observation scheme according to evaluation parameters in scheme evaluation knowledge and an evaluation method;
step5, selecting the local observation scheme with the highest score in Step4, and locking the first target in the scheme as the next observation target;
each parameter in the target filtering knowledge and the scheme evaluation knowledge is calculated by adopting a distribution estimation algorithm, wherein the distribution estimation algorithm specifically comprises the following steps:
step71, initializing a population, generating an initial population uniformly distributed in a value domain by using random sampling, and evaluating the adaptive value of each individual;
step72, dividing the niche, dividing the population into a plurality of sub-populations by adopting a K-means clustering algorithm based on Euclidean distance, wherein the number of the sub-total populations is a function of the iteration times, and the more the iteration times, the more the number of the sub-populations is;
step73, probability distribution estimation, wherein the individuals in each sub-population are subjected to preferential operation, and a dominant individual probability distribution model in each sub-population is established by using the superior individuals in each sub-population under the condition that all variables are mutually independent;
step74, sampling offspring, selecting a certain sub-population for sampling operation by the algorithm with a certain probability each time, sampling offspring by the selected sub-population according to the probability distribution model of the selected sub-population, and ending the Step until the number of newly sampled individuals of the algorithm is equal to the size of the current population;
step75, selecting individuals, combining parent individuals and child individuals in each child population, and obtaining a new generation population by adopting a near elite optimization strategy;
step76, local search, optimizing dominant individuals in the population with a certain probability by adopting a local search algorithm, and further improving the quality of the solution;
and Step77, judging whether the termination condition of the algorithm is reached or not, if so, returning to the found optimal individual, otherwise, jumping to Step72.
2. The distribution estimation method of claim 1, wherein the Step3 of invoking the plurality of heuristics in the scheduling algorithm set to generate the plurality of local observation schemes includes using the target selection index to guide a search direction of the heuristics as follows:
selecting targets in the selection process of the target subset by using a heuristic algorithm, inserting the selected targets into the existing target observation sequence according to a time ascending order to obtain a new target observation sequence, and then performing scheduling solution conversion on the target observation sequence by using a scheduling solution generator; if the gain of the scheduling scheme can be increased, adopting a new target observation sequence, otherwise giving up a newly inserted target, and reserving the original target observation sequence and a corresponding scheduling solution;
the target selection index comprises a time sequence index, a target observation income index, a target imaging duration index and an income duration ratio index, wherein: the time sequence index is that the targets are arranged according to the ascending sequence of the starting time of a time window, and the target with the earliest starting time is selected in sequence; the target observation income indexes are that targets are arranged according to the descending order of observation income, and the target with the highest observation income is selected in sequence; the target imaging duration index is that targets are arranged according to the ascending order of imaging duration, and the target with the minimum imaging duration is selected in sequence; the income time length ratio index is that the targets are arranged according to the descending order of the observed income time length ratio, and the target with the highest observed income time length ratio is selected in sequence.
3. The distribution estimation method of claim 2, wherein the "target selection index" selected first by the heuristic algorithm is a profit-to-time ratio index, and the specific method of calculating one available observation scheme by the scheduling solution generator is as follows, and the profit-to-time ratio of the targets is set to be arranged in descending order of a first target, a second target, a third target, a fourth target, a fifth target, a sixth target, and a seventh target:
firstly, according to the attitude and the resource use state of the current satellite, firstly considering whether a first target of a target can be observed, and if so, updating the current best sequence into a { first target }; secondly, on the basis of the current best solution, adding a second target to the current best solution, and arranging the targets according to a time ascending sequence, namely considering a scheduling solution converted from a sequence of the second target → the first target, and if the benefit of the scheduling solution is better than that of the original current best solution, updating the current best solution into { the second target, the first target }; and then, analogizing in sequence, after adding a target on the basis of the current best solution each time, arranging the tasks according to an ascending order to obtain a new sequence solution, converting the new sequence solution into a scheduling solution by using a scheduling solution generator, and replacing the original current best solution if the income of the new scheduling solution is higher, or keeping the original current best solution.
4. The distribution estimation method according to any one of claims 1 to 3, wherein Step4 specifically includes:
comprehensively evaluating each scheme according to the income, solid storage consumption and electric quantity consumption indexes of each local observation scheme generated by a heuristic algorithm set so as to select a local scheduling scheme capable of improving the global observation income more probably;
in the observation scheme evaluation module, the satellite evaluates seven attributes of the observation scheme, which are respectively: scheme income, solid deposit consumption, income solid deposit ratio, total electric quantity consumption, income electric quantity ratio, execution duration and income time consumption ratio; wherein: the scheme profit is the sum of target observation profits of all imaging targets in the local observation scheme; the consolidation consumption is the amount of satellite consolidation resources consumed by the local observation scheme; the profit-to-solid ratio is the profit of the local observation scheme divided by the solid consumption of the scheme; the total power consumption is the power consumption of the local observation scheme and comprises imaging power consumption and attitude maneuver power consumption; the income electric quantity ratio is the scheme income of the local observation scheme divided by the total electric quantity consumption of the scheme; the execution duration is the total flight duration of the satellite consumed by the local observation scheme, and mainly comprises the imaging duration and the attitude maneuver duration; the profit-to-time consumption ratio is the plan profit of the local observation plan divided by the execution time of the plan.
5. The distribution estimation method according to claim 4, wherein the evaluation function of the local observation scheme is:
cScore=w1·acPro·proP+w2·acSD·sdP+w3·acPsdR·psdR+w4·acEG·egP+w5·acPegR·pegR+w6·acED·edP+w7·acPedR·pedR (35)
wherein the content of the first and second substances,
acPro=e(aPro·rePro+bPro) (36)
acSD=e(aSD·reSDR+bSD) (37)
acPsdR=e(aPsdR·reSDR+bPsdR) (38)
acEG=e(aEG·reEGR+bEG) (39)
acPegR=e(aPegR·reEGR+bPegR) (40)
acED=e(aED·reRCR+bED) (41)
acPedR=e(aPedR·reRCR+bPedR) (42)
Figure FDA0002947055250000041
in the formula, w1Weight coefficient, w, for scheme benefit, proP2Weight coefficient corresponding to depletion sdP, w3A weight coefficient, W, corresponding to the profit-survival ratio psdR4A weight coefficient, w, corresponding to the total power consumption egP5A weight coefficient, w, corresponding to the profit-to-electric-quantity ratio, pegR6For execution duration edP to correspondWeight coefficient of (d), w7The value range of each weight coefficient is [ -100, 100 ] corresponding to the gain-time-consumption ratio pedR]Pr is a progress ratio, rePro is a relative relationship between a current observation gain and an expected observation gain, acPro is an observation gain weight adjustment coefficient, acSD is a solid consumption weight adjustment coefficient, acpdr is a gain solid ratio weight adjustment coefficient, acEG is a total power consumption weight adjustment coefficient, acpergr is a gain power ratio weight adjustment coefficient, acED is an execution time length weight adjustment coefficient, acperr is a gain time consumption weight adjustment coefficient, apc is a scheme gain correlation coefficient, bPro is a scheme gain correlation coefficient, adpro is a scheme gain correlation coefficient, aSD is a solid consumption correlation coefficient, bSD is a solid consumption correlation coefficient, apdr is a gain solid storage ratio correlation coefficient, bpdr is a gain solid ratio correlation coefficient, aEG is a total power consumption correlation coefficient, bEG is a total power consumption correlation coefficient, aPegR is a gain power ratio correlation coefficient, pepr is a power ratio correlation coefficient, aED is an execution time length correlation coefficient, bED is the correlation coefficient of execution duration, aPedR is the correlation coefficient of profit-to-time consumption ratio, bPedR is the correlation coefficient of profit-to-time consumption ratio, redR is the relative relation between the current satellite fixed resource consumption and the expected fixed resource consumption, REEGR is the relative relation coefficient of electric quantity resource, ReRCR represents the relative relation between the current satellite state and the historical statistical state, curProprIndicating the current progress of the satellite compared to the observed gain, mePro, that has been obtained under prprIs the average expected observed benefit, sdProprIndicating the standard deviation of the observed revenue expected for progress vs. pr under the assumption of positive distribution.
6. The distribution estimation method according to claim 5, wherein the method of "selecting the local observation scenario with the highest score" in Step5 specifically includes:
the global observation income of the target identification star in one scene is regarded as a function of a tarFilter threshold parameter set and a schEval scheme evaluation parameter set, a parameter learning process is regarded as a parameter optimization process, an optimization target is the sum of the global income of the target identification star in all learning scenes, and the calculation formula is that the corresponding parameters when the global observation scheme is calculated to be the highest are as follows:
Figure FDA0002947055250000051
wherein numSce is the number of historical scenes;
the tarFilter is a filtering threshold parameter set, and comprises a filtering threshold threPDR, a filtering threshold threPro and a filtering threshold threPDR in Step 2;
SchEva includes aPro, bPro, aSD, bSD, aPsdR, bPsdR, aEG, bEG, aPegR, bPegR, w1,w2,w3,w4,w5,w6,w7
globalProi(… ) is the global observed yield in the ith scene under the usage trait parameter.
7. The distribution estimation method according to claim 6,
in Step72, the adopted sub population number mapping function is shown as a formula (44), wherein numIter represents the optimized algebra; numPop represents the number of sub-populations; round () represents a round to integer operation;
Figure FDA0002947055250000052
in Step73, the dominant individuals of the top 10% of the whole population are selected, then a Gaussian distribution probability model is established in each sub-population according to the dominant individuals, taking sub-population i as an example, the mean value and the variance of the fitting positive-power distribution model are shown as formulas (45) and (46), wherein numExciThe number of dominant individuals of the ith sub-population;
Figure FDA0002947055250000054
representing the j dominant individual in the ith sub-population; mu.siRepresenting the mean value of the dominant individuals in the ith sub-population; sigmaiRepresenting the standard deviation of the dominant individual in the ith sub-population;
Figure FDA0002947055250000053
Figure FDA0002947055250000061
in Step74, for each individual in the new generation, determining a sub-population for sampling by a roulette method, and generating a new individual by sampling operation according to a corresponding dominant individual probability distribution model; the probability of each sub-population being selected is shown as a formula (47), namely the sum of the adaptive values of the dominant individuals in the sub-population accounts for the percentage of the sum of the adaptive values of the dominant individuals in the whole population; wherein, selectPiRepresenting the probability that the ith sub-population is selected, and Fit () representing the adaptive value of an individual;
Figure FDA0002947055250000062
in Step76, the dominant individual in each population is further optimized by using the hill climbing method, and the local search probability of the dominant individual is shown as the formula (48), wherein searchPjRepresenting an individual xjProbability of performing a local search operation, FitMRepresenting the adaptive value of the optimal individual of the whole population, fixing all other attributes when optimizing a certain attribute of the individual by a hill climbing method, optimizing a single attribute by adopting a variable step length mode, ending the algorithm if the adaptive value of the individual is not improved after optimizing all the attributes one by the hill climbing method, otherwise, starting the optimization from the first attribute, wherein the strategy of the variable step length is to try the maximum step length at first during the optimization of the attribute each time, and replacing the original attribute value with a new attribute value if the current step length can optimize the attribute, and keeping the step length; otherwise, the step size is changed into half of the current step size and the user tastes the current step sizeOptimizing by trial; if the current step length is smaller than the minimum step length, ending the optimization of the attribute, wherein the maximum step length is set to be 10, and the minimum step length is 0.1;
Figure FDA0002947055250000063
CN201711250676.6A 2017-12-01 2017-12-01 Distribution estimation method Active CN108022045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711250676.6A CN108022045B (en) 2017-12-01 2017-12-01 Distribution estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711250676.6A CN108022045B (en) 2017-12-01 2017-12-01 Distribution estimation method

Publications (2)

Publication Number Publication Date
CN108022045A CN108022045A (en) 2018-05-11
CN108022045B true CN108022045B (en) 2021-05-14

Family

ID=62078159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711250676.6A Active CN108022045B (en) 2017-12-01 2017-12-01 Distribution estimation method

Country Status (1)

Country Link
CN (1) CN108022045B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002966B (en) * 2018-06-25 2019-08-09 湖南国科轩宇信息科技有限公司 A kind of more star mission planning methods based on K mean cluster
CN109767128B (en) * 2019-01-15 2021-06-11 中国人民解放军国防科技大学 Imaging satellite autonomous task planning method based on machine learning
CN110021177B (en) * 2019-05-06 2020-08-11 中国科学院自动化研究所 Heuristic random search traffic signal lamp timing optimization method and system
CN112612295B (en) * 2020-12-23 2022-07-12 长光卫星技术股份有限公司 Remote sensing satellite ground station measurement and control and automatic distribution method of data transmission resources
CN113157417A (en) * 2021-04-23 2021-07-23 复旦大学 Heuristic scheduling algorithm for multi-core data independent tasks

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300501A (en) * 2005-11-08 2008-11-05 纳夫科姆技术公司 Sampling threshold and gain for satellite navigation receiver
CN107290961A (en) * 2017-06-29 2017-10-24 中国人民解放军国防科学技术大学 A kind of on-line scheduling method for quick satellite
WO2017200499A1 (en) * 2016-05-18 2017-11-23 Still Arser Is Makinalari Servis Ve Ticaret Anonim Sirketi Integrated software and hardware systems for lifelong cost optimization for storage equipment and healthy and safe environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300501A (en) * 2005-11-08 2008-11-05 纳夫科姆技术公司 Sampling threshold and gain for satellite navigation receiver
WO2017200499A1 (en) * 2016-05-18 2017-11-23 Still Arser Is Makinalari Servis Ve Ticaret Anonim Sirketi Integrated software and hardware systems for lifelong cost optimization for storage equipment and healthy and safe environment
CN107290961A (en) * 2017-06-29 2017-10-24 中国人民解放军国防科学技术大学 A kind of on-line scheduling method for quick satellite

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
优化多星协同观测的改进广义模式搜索算法;陈盈果等;《国防科技大学学报》;20120228;第34卷(第1期);第88页-第93页 *
灵巧卫星任务规划模型及算法;姚锋等;《计算机集成制造系统》;20130515;第19卷(第5期);第1035页-第1040页 *

Also Published As

Publication number Publication date
CN108022045A (en) 2018-05-11

Similar Documents

Publication Publication Date Title
CN108022045B (en) Distribution estimation method
CN108037986B (en) Target observation method for double-star cluster
CN106647262B (en) Differential evolution method for agile satellite multi-target task planning
CN108055067B (en) Multi-satellite online cooperative scheduling method
CN107025363A (en) A kind of adaptive big neighborhood search method of Agile satellite scheduling
CN107886201B (en) Multi-objective optimization method and device for multi-unmanned aerial vehicle task allocation
CN111582428B (en) Multi-mode and multi-target optimization method based on gray prediction evolution algorithm
CN108021433B (en) Target observation method for multi-satellite cluster
CN113852405A (en) Method and device for constructing multi-beam relay satellite task scheduling model
CN114362175B (en) Wind power prediction method and system based on depth certainty strategy gradient algorithm
CN111126549A (en) Double-star spectrum fitting method based on strategy improved goblet and sea squirt intelligent algorithm
CN113836803A (en) Improved wolf colony algorithm-based unmanned aerial vehicle cluster resource scheduling method
CN113469372A (en) Reinforcement learning training method, device, electronic equipment and storage medium
CN111008790A (en) Hydropower station group power generation electric scheduling rule extraction method
CN114997611A (en) Distributed multi-satellite task planning method considering maximum profit and load balance
Peng et al. Continual match based training in Pommerman: Technical report
CN113947022B (en) Near-end strategy optimization method based on model
CN111161384A (en) Path guiding method of participating media
CN110458470A (en) More quick earth observation satellite method for allocating tasks based on big Neighborhood-region-search algorithm
CN117077981B (en) Method and device for distributing stand by fusing neighborhood search variation and differential evolution
CN111062561B (en) AHP ideal target point solving method and system for multi-target satellite mission planning
Pukkala Measuring non-wood forest outputs in numerical forest planning: A review of Finnish research
Qu et al. Memetic evolution strategy for reinforcement learning
CN111191941A (en) Method for solving SEVM model problem based on artificial bee colony algorithm
CN116191421A (en) Novel power system multi-objective optimized scheduling method based on improved NSGA-II algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant