CN110086570A

CN110086570A - A kind of modulation threshold regulating method, device and computer equipment

Info

Publication number: CN110086570A
Application number: CN201910274286.5A
Authority: CN
Inventors: 张存; 张冰; 张奭; 段明明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2019-08-02
Anticipated expiration: 2039-04-08
Also published as: CN110086570B

Abstract

The present invention is suitable for field of computer technology, provides a kind of modulation threshold regulating method, device and computer equipment.This method comprises: obtaining multiple state space information, motion space information and the practical Packet Error Rate of target device；According to the multiple rate increments of state space information acquisition；Multiple reward values and the first accumulation reward value are obtained according to award algorithm according to preset tolerable Packet Error Rate, multiple rate increments and practical Packet Error Rate；State-function of movement is obtained according to above-mentioned parameter；According to the transition activities of current threshold value and threshold value, variation threshold value is obtained, and sends target device and carries out channel maintenance；The relevant parameter of target device is updated state-function of movement according to scheduled update method after acquisition preset time, threshold value after being optimized, for the channel maintenance of target device, this method carries out the adjustment of Adaptive Modulation thresholding using Q-Learning algorithm, obtains optimal threshold value.

Description

A kind of modulation threshold regulating method, device and computer equipment

Technical field

The invention belongs to computer field more particularly to a kind of modulation threshold regulating methods, device and computer equipment.

Background technique

Existing Hinoc modulation thresholding uses OFDM (orthogonal frequency division multiplexi) and QAM (quadrature amplitude modulation) technology As physical layer modulation technology, the physical channel of 128Mhz is divided into 2048 orthogonal sub-channels, by 16 adjacent sons Channel is classified as one group, and totally 128 groups, each group of all subchannels use identical modulation format, with this group of sub-channels highest Signal-to-noise ratio on the basis of with modulation thresholding be compared, select signal-to-noise ratio not less than modulation thresholding highest modulation format adjusted System.And modulating thresholding is currently the default value for directly achieving emulation and obtaining, it is difficult to adapt to the demand of varying environment.

In actual Hinoc network, varying environment lower channel situation is different, as time goes by, due to environment The change of channel conditions or the accuracy to channel conditions estimation caused by variation, device aging, calculating error accumulation etc., may So that the threshold vector of default is no longer optimal value, bandwidth availability ratio is reduced under some channel conditions.Thus one adaptive The modulation threshold adjustment algorithm answered just is particularly important.

Summary of the invention

The embodiment of the present invention is designed to provide a kind of modulation threshold regulating method, it is intended to solve existing modulation thresholding tune Channel circumstance changes in adjusting method, and default modulation thresholding is unable to satisfy the technical issues of modulation requires.

The embodiments of the present invention are implemented as follows, a kind of modulation threshold regulating method, which comprises

Obtain the multiple state space information, motion space information of target device and practical mistake in the first preset time range Packet rate, the state space information include threshold value, rate and the bit error rate；

Multiple rate increments are obtained according to multiple rates；

According to preset tolerable Packet Error Rate, the multiple rate increment, multiple practical Packet Error Rates according to scheduled award Algorithm, which calculates, obtains multiple reward values；

The multiple reward value is calculated according to scheduled accumulation award algorithm and obtains the first accumulation reward value；

State-function of movement is obtained according to the motion space information, multiple threshold values and the first accumulation reward value；It is described State action function be using act as abscissa, threshold value be as ordinate, storage value the first accumulation reward value matrix；

The threshold value for obtaining current target device, the variation according to scheduled movement variation selection method selection threshold value are dynamic Make, obtains variation threshold value, and the variation threshold value is sent to target device and carries out channel maintenance；

State space information, motion space information and the practical Packet Error Rate of target device after the second preset time are obtained, and It is calculated according to scheduled award algorithm and obtains corresponding reward value；

Acquisition second is calculated according to scheduled accumulation award algorithm according to the multiple reward value and corresponding reward value to tire out Product reward value；

According to the variation threshold value, motion space information and corresponding reward value according to scheduled update method to shape State-function of movement is updated, the threshold value after being optimized, the channel maintenance for target device.

The another object of the embodiment of the present invention is to provide a kind of modulation thresholding adjustment device, comprising:

The first information obtains module, and multiple state spaces for obtaining target device in the first preset time range are believed Breath, motion space information and practical Packet Error Rate, the state space information include threshold value, rate and the bit error rate；

Rate increment computing module, for obtaining multiple rate increments according to multiple rates；

First reward value computing module, for according to preset tolerable Packet Error Rate, the multiple rate increment, Duo Geshi Border Packet Error Rate calculates according to scheduled award algorithm and obtains multiple reward values；

First accumulation reward value computing module, for calculating the multiple reward value according to scheduled accumulation award algorithm Obtain the first accumulation reward value；

State-function of movement obtains module, for being encouraged according to the motion space information, multiple threshold values and the first accumulation Reward is worth acquisition state-function of movement；The state action function is to act as abscissa, threshold value as ordinate, storage value For the matrix of the first accumulation reward value；

Change threshold value and select sending module, obtain the threshold value of current target device, changes according to scheduled movement and select Selection method select threshold value transition activities, obtain variation threshold value, and by the variation threshold value be sent to target device into Row channel maintenance；

Second reward value computing module obtains the state space information of target device, motion space after the second preset time Information and practical Packet Error Rate, and calculated according to scheduled award algorithm and obtain corresponding reward value；

Second accumulation reward value computing module, according to the multiple reward value and corresponding reward value according to scheduled accumulation It awards algorithm and calculates acquisition the second accumulation reward value；

Function of state update module, for being pressed according to the variation threshold value, motion space information and corresponding reward value State-function of movement is updated according to scheduled update method.

The another object of the embodiment of the present invention is to provide a kind of computer equipment, which is characterized in that including memory and Processor, the memory are stored with computer program, which is characterized in that the processor executes real when the computer program The step of existing any one of claims 1 to 8 the method.

A kind of modulation threshold regulating method provided in an embodiment of the present invention, by the shape for obtaining target device in preset time State space information, motion space information and practical Packet Error Rate can obtain the accumulative reward value in preset time, establish state- Function of movement, and the movement variation selected according to current threshold value and presetting method is modified to current threshold value and is changed Threshold value is sent to target device and obtains the parameters such as the state space information after the predetermined time, calculates jackpot prize value, and more New state-function of movement, the threshold value after finally being optimized, this method carry out Adaptive Modulation using Q-Learning algorithm Thresholding adjustment, makes the bit error rate within the acceptable limits, and improve total bandwidth.

Detailed description of the invention

Fig. 1 is the flow chart for the modulation threshold regulating method that the embodiment of the present invention one provides；

Fig. 2 is the flow chart of modulation threshold regulating method provided by Embodiment 2 of the present invention；

Fig. 3 is the modulation format distribution map for the Hinoc equipment that the embodiment of the present invention three provides；

Fig. 4 is the schematic diagram that the modulation thresholding that the embodiment of the present invention four provides adjusts device.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

It is appreciated that term " first " used in this application, " second " etc. can be used to describe various elements herein, But unless stated otherwise, these elements should not be limited by these terms.These terms are only used to by first element and another yuan Part is distinguished.For example, in the case where not departing from scope of the present application, the first xx script can be known as the 2nd xx script, And similarly, the 2nd xx script can be known as the first xx script.

As shown in Figure 1, in one embodiment it is proposed that a kind of modulation threshold regulating method, can specifically include following Step:

Step S101 obtains multiple state space information of target device in the first preset time range, motion space letter Breath and practical Packet Error Rate, the state space information includes threshold value, rate and the bit error rate.

In an embodiment of the present invention, first preset time is since target is set as initial threshold to current A period of time that threshold value terminates, the target of acquisition set multiple state space information of table as the corresponding rate of different threshold values and The bit error rate, the motion space information include one unit two movement of one unit of decline and increase.

Step S102 obtains multiple rate increments according to multiple rates.

In an embodiment of the present invention, multiple rates of acquisition are carried out according to the sequencing of target device runing time Sequence, calculates the absolute difference of two neighboring rate value, which is rate increment, is being adjusted to obtain threshold value Multiple rate increments in the process.

Step S103, according to preset tolerable Packet Error Rate, the multiple rate increment, multiple practical Packet Error Rates according to pre- Fixed award algorithm, which calculates, obtains multiple reward values.

In an embodiment of the present invention, the tolerable Packet Error Rate Ratio_bareFor current target device run business can Tolerate Packet Error Rate, it can be by user rule of thumb or the regulation of operation business is set, and according to multiple rate increments Speed_delta, multiple practical Packet Error Rate Ratio_realReward value R is calculated according to formula (1), it is corresponding to obtain different threshold values Multiple reward values.

R=α log (log (Ratio_bare)-log(Ratio_real))+βSpeed_delta (1)

In an embodiment of the present invention, wherein α, β are constant, parameter when as debugging.It can be seen from the above calculating formula When Packet Error Rate is in business tolerance, the first half of calculating formula changes very slow with Packet Error Rate, can almost recognize To be a definite value, reward value is mainly determined by the increment of rate at this time；And when Packet Error Rate is close to even beyond tolerable mistake packet When rate, the value of first half calculating formula can sharply decline, to bring very big punishment to total award, rate increment is brought at this time Reward then can be ignored.Neatly modulation thresholding can be adjusted according to the Packet Error Rate demand of different business in this way It is whole.In the realization of actual algorithm, since observing time is limited, Ratio set by user_bareWhen smaller, can it is observed that Wrong packet number is 0, then calculated Ratio at this time_reatIt is 0, may cause program and floating number calculating exception occur；Therefore it is realizing In the process, as the Ratio of calculating_realWhen being 0, then Ratio is set_real=10-1Ratio_bare, formula first half is 0 at this time, Reward value is only by Speed_deltaIt determines.Similarly, work as Ratio_real> Ratio_bareWhen, directly make first half score value -100, with Log is avoided to calculate mistake.

The multiple reward value is calculated according to scheduled accumulation award algorithm and obtains the first accumulation award by step S104 Value.

In an embodiment of the present invention, step S103 multiple reward value r obtained are obtained, and are pressed according to weight coefficient γ The first accumulation reward value is calculated according to accumulation of discount award value-based algorithm such as formula (2), wherein weight coefficient γ is preset parameter.

It should be understood that accumulation award algorithm can also walk accumulation award algorithm according to T, i.e., threshold value is adjusted to each The reward value of acquisition is averaged.The accumulation reward value can be used for assessing a tactful superiority and inferiority, can be by constantly counting Accumulation reward value is calculated, and by comparing acquisition optimal strategy.

Step S105 obtains state-movement according to the motion space information, multiple threshold values and the first accumulation reward value Function；The state action function be using act for abscissa, threshold value as ordinate, storage value be the first accumulation reward value Matrix.

In an embodiment of the present invention, the state-function of movementIt is shown as that an abscissa is movement, ordinate is Threshold value, storage value are to accumulate the matrix of reward value, set control spatial information as a, threshold value x is then accumulated according to first Reward value can obtain state action function, as shown in formula (3).

It should be understood that state-function of movement can be with when accumulating reward value and walking accumulation award algorithm according to T and obtain WithIt indicates.

Step S106 obtains the threshold value of current target device, selects thresholding according to scheduled movement variation selection method The transition activities of value obtain variation threshold value, and the variation threshold value are sent to target device and carries out channel maintenance.

In embodiments herein, the scheduled movement variation selection method uses ε-greedy algorithm, and wherein ε-is greedy Shown in center algorithm such as formula (4), to obtain the probability of different movement variations, and the maximum corresponding movement of select probability.

WhereinK is that threshold value adjusts number so that in algorithm initial operating stage, with great probability into Row is explored, and after the exploration of accumulation enough times, into exploration and greed and the state deposited.

In embodiments of the present invention, according to the movement of the maximum probability of acquisition, current threshold value is increased accordingly Add or reduce, variation threshold value is obtained, and variation threshold value is sent to target setting, so that target device is in the variation door Channel maintenance is carried out under the conditions of limit value.

Step S107 obtains state space information, motion space information and the reality of target device after the second preset time Packet Error Rate, and calculated according to scheduled award algorithm and obtain corresponding reward value.

In an embodiment of the present invention, second preset time is the channel maintenance that target device carries out a period of time Modulation thresholding comes into force afterwards, which is set as 45s or more, to guarantee the accuracy of Bit Error Ratio Measurement, while equipment being allowed to have The statistics of sufficient carry out data.When obtaining state space information, motion space information calculates accordingly with after practical Packet Error Rate Rate increment, and corresponding reward value is obtained according to the calculating of formula (1) progress reward value according to tolerable Packet Error Rate.

Step S108 is obtained according to the multiple reward value and corresponding reward value according to scheduled accumulation award algorithm calculating Obtain the second accumulation reward value.

In an embodiment of the present invention, the reward value obtained according to the step S103 multiple reward values obtained and step S107 The first accumulation reward value is calculated according to accumulation of discount award value-based algorithm such as formula (2) according to weight coefficient γ.It should be understood that The accumulation awards algorithm and can also be averaged to each reward value for adjusting threshold value acquisition according to T step accumulation award algorithm. The accumulation reward value can be used for assessing a tactful superiority and inferiority, can be by constantly calculating accumulation reward value, and pass through ratio Compared with acquisition optimal strategy.

Step S109, according to the variation threshold value, motion space information and corresponding reward value according to scheduled update Method is updated state-function of movement, the threshold value after being optimized, the channel maintenance for target device.

It is the threshold value before modulation with x in embodiments herein, variation threshold value x ' is obtained after acting a, and Next threshold variation is obtained according to the ε of step S106-greedy algorithm and acts a ', and sets corresponding reward value as r, to shape State-function of movement is updated according to formula (5), obtains new state-function of movement.

Wherein, α_k+1For increment coefficient, γ is weight coefficient, the number for the modulation that k is indicated.

It should be understood that further including being counted to modulation number in this method, change whenever by a threshold value, It then modulates number to carry out plus 1 processing, obtains the modulation for continuing threshold value after new dynamic-function of movement, repetitive cycling step Rapid S106-S109, until meeting cycling condition terminates, which can be modulation number, for example, setting adjustment time Number is 500 times, when adjusting number is 500, then terminates to modulate, the threshold value after being optimized, the channel for target device Maintenance.

In embodiments of the present invention, the condition for recycling end can be other situations, for example preset an accumulation reward value Perhaps the bit error rate is when obtaining modulated accumulative reward value or the bit error rate reaches set value, then can with end loop, Thus the threshold value after being optimized, it should be appreciated that preset accumulative reward value and the bit error rate can be and pass through user experience Value gained, is also possible to as obtained by big data analysis.

In an embodiment of the present invention, which is a Q-Learning process, is on the one hand updated by iteration State-function of movement obtains an assessed value by the iterative process of finite number of time, on the other hand generates to learning process State-function of movement, using ε-greedy algorithm, since original state, to one state x of coal, if there is Q^π(x, Inc) >=Q^π (x, Dec) then increases a unit to thresholding, and until the condition is not until establishment, wherein Inc indicates that thresholding increases a list Position, Dec indicate that thresholding reduces by a unit.

It should be understood that above-mentioned expression is the method for the threshold value modulation of target device, in cyclic process The time that may be needed is longer, and the modulation thresholding setting between multiple equipment is relatively independent, and during algorithm iteration Most of the time, algorithm process are not take up CPU all in dormant state, therefore can start simultaneously in practical work process more A process executes adjust automatically algorithm to multiple equipment.

The thresholding modulator approach by obtain the state space information of target device in preset time, motion space information and Practical Packet Error Rate can obtain the accumulative reward value in preset time, establish state-function of movement, and according to current threshold value Movement variation with presetting method selection, which modifies to current threshold value, obtains variation threshold value, is sent to target device and obtains The parameters such as the state space information after taking the predetermined time calculate jackpot prize value, and more new state-function of movement, final to obtain Threshold value after optimization, this method carry out Adaptive Modulation thresholding using Q-Learning algorithm, make the bit error rate acceptable In limit, and improve total bandwidth.

In one embodiment, as shown in Fig. 2, step S106 can specifically include following steps:

Step S201, the probability of threshold value difference movement variation is calculated according to ε-greedy algorithm, and selects the probability most Big corresponding movement.

In an embodiment of the present invention shown in the ε-greedy algorithm such as formula (4), acted by different in motion space The calculating of probability, to obtain the corresponding movement of maximum probability.

Step S202 obtains current goal threshold value, according to the corresponding movement of the maximum probability to the current threshold Value is increased or decreased, and variation threshold value is obtained.

In an embodiment of the present invention, the door that current goal is set is obtained from the state space information that step S101 is obtained Limit value accordingly changes current threshold value, can be increasing according to movement corresponding to the maximum probability of above-mentioned calculating acquisition Add a unit or reduce by a unit, to obtain variation threshold value.

The variation threshold value is sent to target device, so that target device is in the variation threshold value by step S203 Under the conditions of carry out channel maintenance.

In embodiments of the present invention, after threshold value being adjusted, it is necessary to target device is sent to, so that target device is being adjusted It is run under conditions of after system, to obtain modulated each parameter, for assessing modulated target device maintenance condition.

This method calculates the different probability of threshold variation movement, the big variation of select probability by using ε-greedy algorithm Movement obtains the channel maintenance that preferably threshold value carries out target device, is capable of the direction of accurate position gates limit value variation, makes Optimization threshold value can be obtained as soon as possible by obtaining target device.

In an embodiment of the present invention, the distributed intelligence of the modulation format of target device is first inquired before thresholding modulation, For example (High Performance Network Over Coax, high-performance coaxial cable broadband connect the Hinoc on comparison network Enter) equipment be an OAM (Operation Administration and Maintenance, operation and maintenance management) inquiry, The distribution of modulation format is obtained, as shown in Fig. 3.As can be seen from FIG. 3, modulation format integrated distribution is between 6-8, it is known that modulation Format concentrates between 64-QAM to 256-QAM, and therefore, this algorithm only needs centralized optimization T₅~T₉These parameters.

This method selects the required threshold parameter modulated, substantially reduces tune by the distribution situation of acquisition modulation format The space of system.

As shown in figure 4, in one embodiment, providing a kind of schematic diagram of modulation thresholding adjustment device, the modulation door Limit device can specifically include the first information and obtain module 410, rate increment computing module 420, the first reward value computing module 430, the first accumulation reward value computing module 440, state-function of movement obtains module 450, and variation threshold value selects sending module 460, the second reward value computing module 470, the second accumulation reward value computing module 480, function of state update module 490.

The first information obtains module 410, for obtaining the multiple reward value according to scheduled accumulation award algorithm calculating Obtain the first accumulation reward value.

Rate increment computing module 420, for obtaining multiple rate increments according to multiple rates.

First reward value computing module 430, for according to preset tolerable Packet Error Rate, the multiple rate increment, more A practical Packet Error Rate calculates according to scheduled award algorithm and obtains multiple reward values.

=α log (log (Ratio_bare)-log(Ratior_eal))+βSpeed_delta (1)

In an embodiment of the present invention, wherein α, β are constant, parameter when as debugging.

First accumulation reward value computing module 440, for the multiple reward value to be awarded algorithm according to scheduled accumulation It calculates and obtains the first accumulation reward value.

In an embodiment of the present invention, step S103 multiple reward value r obtained are obtained, and are pressed according to weight coefficient γ The first accumulation reward value is calculated according to accumulation of discount award value-based algorithm such as formula (2).

State-function of movement obtains module 450, for tired according to the motion space information, multiple threshold values and first Product reward value obtains state-function of movement；The state action function is to act as abscissa, threshold value as ordinate, deposit Stored Value is the matrix of the first accumulation reward value.

Change threshold value and select sending module 460, obtain the threshold value of current target device, changes according to scheduled movement Selection method selects the transition activities of threshold value, obtains variation threshold value, and the variation threshold value is sent to target device Carry out channel maintenance.

It is empty to obtain the state space information of target device, movement after the second preset time for second reward value computing module 470 Between information and practical Packet Error Rate, and according to the corresponding reward value of scheduled award algorithm calculating acquisition.

Second accumulation reward value computing module 480, according to the multiple reward value and corresponding reward value according to scheduled Accumulation award algorithm, which calculates, obtains the second accumulation reward value.

Function of state update module 490, for according to the variation threshold value, motion space information and corresponding reward value State-function of movement is updated according to scheduled update method.

It is the threshold value before modulation with x in embodiments herein, variation threshold value x ' is obtained after acting a, and Next threshold variation is obtained according to the ε of step S106-greedy algorithm and acts a ', and is set corresponding reward value and be expressed as r, it is right State-function of movement is updated according to formula (5), obtains new state-function of movement.

In an embodiment of the present invention, the present invention also provides a kind of computer equipment, including memory and processor, institutes It states memory and is stored with computer program, which is characterized in that the processor realizes that radar is dry when executing the computer program The step of disturbing effect evaluation method.

Although should be understood that various embodiments of the present invention flow chart in each step according to arrow instruction successively It has been shown that, but these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, There is no stringent sequences to limit for the execution of these steps, these steps can execute in other order.Moreover, each embodiment In at least part step may include that perhaps these sub-steps of multiple stages or stage are not necessarily multiple sub-steps Completion is executed in synchronization, but can be executed at different times, the execution in these sub-steps or stage sequence is not yet Necessarily successively carry out, but can be at least part of the sub-step or stage of other steps or other steps in turn Or it alternately executes.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of modulation threshold regulating method, which is characterized in that the described method includes:

Obtain the multiple state space information, motion space information of target device and practical wrong packet in the first preset time range Rate, the state space information include threshold value, rate and the bit error rate；

Multiple rate increments are obtained according to multiple rates；

According to preset tolerable Packet Error Rate, the multiple rate increment, multiple practical Packet Error Rates according to scheduled award algorithm It calculates and obtains multiple reward values；

State-function of movement is obtained according to the motion space information, multiple threshold values and the first accumulation reward value；The state Function of movement be using act as abscissa, threshold value be as ordinate, storage value the first accumulation reward value matrix；

The threshold value for obtaining current target device selects the transition activities of threshold value according to scheduled movement variation selection method, Variation threshold value is obtained, and the variation threshold value is sent to target device and carries out channel maintenance；

Obtain state space information, motion space information and the practical Packet Error Rate of target device after the second preset time, and according to Scheduled award algorithm, which calculates, obtains corresponding reward value；

It is calculated according to the multiple reward value and corresponding reward value according to scheduled accumulation award algorithm and obtains the second accumulation prize Reward value；

It is dynamic to state-according to scheduled update method according to the variation threshold value, motion space information and corresponding reward value It is updated as function, the threshold value after being optimized, the channel maintenance for target device.

2. modulation threshold regulating method according to claim 1, which is characterized in that described to be obtained according to multiple rates The step of multiple rate increments, specifically includes:

Multiple rates are arranged according to target device runing time sequencing, and calculate the absolute difference of two neighboring rate Value, the absolute difference is rate increment, to obtain multiple rate increments.

3. modulation threshold regulating method according to claim 1, which is characterized in that described according to preset tolerable mistake packet Rate, the multiple rate increment, multiple practical Packet Error Rates calculate the step of obtaining multiple reward values according to scheduled award algorithm, It specifically includes:

Obtain preset tolerable Packet Error Rate Ratio_bare, the rate increment Speed_delta, practical Packet Error Rate Ratio_real；

According to formula α log (log (Ratio_bare)-log(Ratio_real))+βSpeed_deltaIt calculates and obtains reward value, it is more to obtain A reward value, described α, β are preset parameter.

4. modulation threshold regulating method according to claim 1, which is characterized in that it is described by the multiple reward value according to Scheduled accumulation award algorithm calculates the step of obtaining the first accumulation reward value, specifically includes:

Multiple reward value r are obtained, and algorithm is awarded according to γ accumulation of discountThe first accumulation reward value is calculated, the γ is Weight coefficient.

5. modulation threshold regulating method according to claim 1, which is characterized in that described to be believed according to the motion space The step of breath, multiple threshold values and the first accumulation reward value obtain state-function of movement, specifically includes:

According to the motion space information a, multiple threshold value x and the first accumulation reward value according to formulaAcquisition state-function of movement, the γ are weight coefficient.

6. modulation threshold regulating method according to claim 1, which is characterized in that the door for obtaining current target device Limit value obtains variation threshold value, and by the change according to the transition activities of scheduled movement variation selection method selection threshold value Change threshold value and be sent to the step of target device carries out channel maintenance, specifically includes:

The probability of threshold value difference movement variation is calculated according to ε-greedy algorithm, and selects the corresponding movement of the maximum probability；

Current goal threshold value is obtained, increasing is carried out to the current threshold value according to the corresponding movement of the maximum probability and is added deduct It is few, obtain variation threshold value；

The variation threshold value is sent to target device, so that target device carries out channel under the variation threshold condition Maintenance.

7. modulation threshold regulating method according to claim 1, which is characterized in that it is described according to the variation threshold value, Motion space information and corresponding reward value are updated state-function of movement according to scheduled update method, are optimized It the step of threshold value afterwards, channel maintenance for target device, specifically includes:

According to the variation threshold value x ', movement a, a ' and corresponding reward value r included by the information of motion space, according to FormulaTo state-function of movementIt is updated； The α_k+1For increment coefficient, the γ is weight coefficient.

Threshold value after being optimized, the channel maintenance for target.

8. modulation threshold regulating method according to claim 1, which is characterized in that the first preset time range of the acquisition Before multiple state space information of interior target device, motion space information and practical Packet Error Rate further include:

Obtain the distributed intelligence of the modulation format of target device, the threshold parameter that selection distribution is concentrated, and by the threshold parameter As modulation target.

9. a kind of modulation thresholding adjusts device characterized by comprising

The first information obtains module, for obtaining multiple state space information of target device in the first preset time range, moving Make spatial information and practical Packet Error Rate, the state space information includes threshold value, rate and the bit error rate；

First reward value computing module, for according to preset tolerable Packet Error Rate, the multiple rate increment, multiple practical mistakes Packet rate calculates according to scheduled award algorithm and obtains multiple reward values；

First accumulation reward value computing module is obtained for calculating the multiple reward value according to scheduled accumulation award algorithm First accumulation reward value；

State-function of movement obtains module, for according to the motion space information, multiple threshold values and the first accumulation reward value Acquisition state-function of movement；The state action function is to act as abscissa, threshold value as ordinate, storage value the The matrix of one accumulation reward value；

Change threshold value and select sending module, obtain the threshold value of current target device, changes selecting party according to scheduled movement Method selects the transition activities of threshold value, obtains variation threshold value, and the variation threshold value is sent to target device and carries out letter Road maintenance；

Second reward value computing module obtains state space information, the motion space information of target device after the second preset time With practical Packet Error Rate, and is calculated according to scheduled award algorithm and obtain corresponding reward value；

Second accumulation reward value computing module, is awarded according to the multiple reward value and corresponding reward value according to scheduled accumulation Algorithm, which calculates, obtains the second accumulation reward value；

Function of state update module is used for according to the variation threshold value, motion space information and corresponding reward value according to pre- Fixed update method is updated state-function of movement.

10. a kind of computer equipment, which is characterized in that including memory and processor, the memory is stored with computer journey Sequence, which is characterized in that the processor realizes any one of claims 1 to 8 the method when executing the computer program The step of.