CN110519816A

CN110519816A - A kind of radio roaming control method, device, storage medium and terminal device

Info

Publication number: CN110519816A
Application number: CN201910793482.3A
Authority: CN
Inventors: 黄泽淳; 程文强; 陈建平
Original assignee: TP Link Technologies Co Ltd
Current assignee: TP Link Technologies Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-11-29
Anticipated expiration: 2039-08-22
Also published as: CN110519816B

Abstract

The invention discloses a kind of radio roaming control methods, comprising: the state vector for obtaining client is sampled every the preset time cycle；It include RSSI value, channel utilization and the noise of client corresponding with each wireless access point respectively in the state vector；Result vector is obtained according to the neural network model after the state vector and preset training；It include the corresponding roaming valuation of several described wireless access point in the result vector；According to preset random number, the current exploration coefficient of the neural network model and the result vector selection target wireless access point from several described wireless access point, using the target wireless access points as the roaming candidate of client.Correspondingly, the invention also discloses a kind of radio roaming control device, computer readable storage medium and terminal devices.The usage experience of client can be improved according to environmental change and network internal operating status dynamic adjustment roaming candidate using technical solution of the present invention.

Description

A kind of radio roaming control method, device, storage medium and terminal device

Technical field

The present invention relates to wireless communication technology fields more particularly to a kind of radio roaming control method, device, computer can Read storage medium and terminal device.

Background technique

Roaming refers to that client (comprising products such as router, wireless extensions devices, is referred to as here from a wireless access point For wireless access point) it is switched to the process of another wireless access point, substantially problem to be solved is when to trigger roaming, And how to determine the target wireless access points of roaming switch.

The seamless roam strategy that existing wireless access point is supported is all based on greatly 802.11k/v/r agreement, utilizes signal Intensity (RSSI) realizes radio roaming as threshold value and judge index, and main method may be summarized to be: wireless access point week The RSSI for the client that the monitoring of phase property receives, which is compared with preset signal strength threshold, if the RSSI Less than signal strength threshold, then wireless access point issues 802.11k message request to client, and client starts to query alternative nothing The corresponding RSSI of line access point, and the information inquired is returned into current wireless access point, in wireless access point list, according to Decision logic and method based on RSSI determine the target wireless access points roamed into wireless access point list, to realize Client is switched to the seamless roam process of another wireless access point from a wireless access point.

Existing wireless network roaming strategy is a kind of fixed threshold strategy that decision is carried out based on RSSI, still, wirelessly Network is easy by blocking between multipath effect, position area, height above sea level, temperature humidity, wireless access point and client The influence of the environmental factors such as situation, wireless channel would generally be varied over time, and only pass through RSSI power The considerations of whether being roamed to measure, having lacked the internal operation state to the network of wireless access point composition, it is therefore, only logical Environment and the network internal operation of Various Complex can not be combined by crossing the fixed threshold strategy progress roaming decisions based on RSSI State influences the usage experience of client.

Summary of the invention

The technical problem to be solved by the embodiment of the invention is that providing a kind of radio roaming control method, device, calculating Machine readable storage medium storing program for executing and terminal device, can be according to environmental change and network internal operating status dynamic adjustment roaming mesh Mark, improves the usage experience of client.

In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of radio roaming control method, the methods Suitable for the network being made of several wireless access point；The described method includes:

The state vector for obtaining client is sampled every the preset time cycle；It wherein, include point in the state vector RSSI value, channel utilization and the noise of client not corresponding with each wireless access point；

Result vector is obtained according to the neural network model after the state vector and preset training；Wherein, the knot It include the corresponding roaming valuation of several described wireless access point in fruit vector；

If according to preset random number, the current exploration coefficient of the neural network model and the result vector from described Selection target wireless access point in dry wireless access point, using the target wireless access points as the roaming mesh of client Mark.

Further, described according to preset random number, the current exploration coefficient and the knot of the neural network model Fruit vector selection target wireless access point from several described wireless access point, specifically includes:

Judge whether the current exploration coefficient of the neural network model is greater than preset random number；

If currently exploring coefficient is greater than preset random number, select any wireless in several described wireless access point Access point is as the target wireless access points；

If currently exploring coefficient is not more than preset random number, the roaming valuation maximum value in the result vector is selected Corresponding wireless access point is as the target wireless access points.

Further, after determining target wireless access points, the method also includes:

According to formulaCurrent coefficient of exploring is updated；Wherein, ε_tWhen indicating the t times sampling Corresponding exploration coefficient, ε_t+1Indicate corresponding exploration coefficient, ε when the t+1 times sampling_startIt indicates initial and explores coefficient, ε_endTable Show that coefficient, ε are explored in end_decayIt indicates to explore coefficient attenuation number of iterations.

Further, the method also includes:

Obtain client current actual speed rate and link delay；

It is calculated according to the actual speed rate and the link delay and obtains reward parameter R_t+1With sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t；Wherein, t indicates sampling number, S_tIndicate state vector, the S of the t times sampling acquisition_t+1It indicates The state vector that the t+1 times sampling obtains, A_tIndicate the target wireless access points selected when the t times sampling；

By the sample data (S_t, A_t, R_t+1, S_t+1) and corresponding weight parameter W_tStore preset experience replay pond In；

Several sample datas are chosen as sample set according to the weight probability distribution in the experience replay pond；

The neural network model is optimized according to the sample set and back-propagation algorithm.

Further, described calculated according to the actual speed rate and the link delay obtains reward parameter R_t+1And sample Data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t, it specifically includes:

According to formula R_t+1=(log₁₀S-δ_delay×D)×(1-δ_handoff×1{A_t≠A_t-1) calculate to obtain and reward parameter R_t+1；Wherein, S indicates actual speed rate, and D indicates link delay, δ_delayIndicate link delay specific gravity, δ_handoffIndicate roaming switch Punishment, 1 { A of function_t≠A_t-1Indicate the target wireless access points A selected when the t times sampling_tIt is selected when with the t-1 times sampling Target wireless access points A_t-1When different, 1 { A_t≠A_t-1}=1, the target wireless access points A selected when the t times sampling_tWith The target wireless access points A selected when the t-1 times sampling_t-1When identical, 1 { A_t≠A_t-1}=0；

According to formulaIt calculates and obtains sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t；Wherein, S Indicate actual speed rate, S_theorIndicate the Theoretical Rate that the specifications parameter of the wireless access point connected according to client obtains.

Further, the weight probability distribution according in the experience replay pond chooses several sample data conducts Sample set specifically includes:

According to formulaCalculate the weight probability of each of experience replay pond sample data；Its In, E indicates the quantity of the sample data in the experience replay pond, E >=1, P_jIndicate the weight probability of j-th of sample data, j =1,2, E, W_iIndicate the corresponding weight parameter of i-th of sample data, i=1,2, E；

Weight probability is selected to meet several sample datas of preset condition as the sample set.

Further, the neural network model includes input layer, basal layer, valuation layer, decision-making level and polymer layer,

Then, described that the neural network model is optimized according to the sample set and back-propagation algorithm, it is specific to wrap It includes:

According to the sample set and back-propagation algorithm to the basal layer, valuation layer and decision-making level of the neural network model Parameter optimize.

In order to solve the above-mentioned technical problem, the embodiment of the invention also provides a kind of radio roaming control device, the dresses Set the network for being suitable for being made of several wireless access point；Described device includes:

State vector obtains module, for sampling the state vector for obtaining client every the preset time cycle；Wherein, It include the RSSI value of client corresponding with each wireless access point, channel utilization and making an uproar in the state vector respectively Sound；

Result vector obtains module, for being obtained according to the neural network model after the state vector and preset training Result vector；It wherein, include the corresponding roaming valuation of several described wireless access point in the result vector；

Roaming candidate selecting module, for the current exploration coefficient according to preset random number, the neural network model With the result vector from several described wireless access point selection target wireless access point, the Target Wireless is accessed Roaming candidate of the point as client.

The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium includes The computer program of storage；Wherein, where the computer program controls the computer readable storage medium at runtime Equipment executes radio roaming control method described in any of the above embodiments.

The embodiment of the invention also provides a kind of terminal device, including processor, memory and it is stored in the storage In device and it is configured as the computer program executed by the processor, the processor is real when executing the computer program Existing radio roaming control method described in any of the above embodiments.

Compared with prior art, the embodiment of the invention provides a kind of radio roaming control methods, device, computer-readable Storage medium and terminal device sample the state vector for obtaining client every the preset time cycle, wrap in the state vector RSSI value, channel utilization and the noise for including client corresponding with each wireless access point respectively, according to the state vector Result vector is obtained with the neural network model after preset training, includes that several wireless access point are corresponding in the result vector Roaming valuation, according to preset random number, the current exploration coefficient of neural network model and the result vector from several nothings Selection target wireless access point in line access point, using target wireless access points as the roaming candidate of client, so as to According to environmental change and network internal operating status dynamic adjustment roaming candidate, the usage experience of client is improved.

Detailed description of the invention

Fig. 1 is a kind of flow chart of a preferred embodiment of radio roaming control method provided by the invention；

Fig. 2 is a kind of structural block diagram of a preferred embodiment of neural network model provided by the invention；

Fig. 3 is a kind of structural block diagram of a preferred embodiment of radio roaming control device provided by the invention；

Fig. 4 is a kind of structural block diagram of a preferred embodiment of terminal device provided by the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all without creative efforts Other embodiments shall fall within the protection scope of the present invention.

It is shown in Figure 1 the embodiment of the invention provides a kind of radio roaming control method, it is one kind provided by the invention The flow chart of one preferred embodiment of radio roaming control method, the method are suitable for being made of several wireless access point Network；The method includes the steps S11 to step S13:

Step S11, the state vector of acquisition client is sampled every the preset time cycle；Wherein, the state vector In include client corresponding with each wireless access point respectively RSSI value, channel utilization and noise；

Step S12, result vector is obtained according to the neural network model after the state vector and preset training；Its In, it include the corresponding roaming valuation of several described wireless access point in the result vector；

Step S13, according to preset random number, the current exploration coefficient and the result vector of the neural network model The selection target wireless access point from several described wireless access point, using the target wireless access points as client Roaming candidate.

Specifically, every pre-set time cycle (the specific time cycle can be configured according to actual needs) The network being made of several (being assumed to be M) wireless access point is sampled, obtains client to be roamed relative to every Uplink RSSI value corresponding to one wireless access point, channel utilization U and noise N, accordingly obtain the state of the client to Amount is S_t=[RSSI₁, U₁, N₁, RSSI_M, U_M, N_M], wherein t indicates sampling number, S_tIt indicates the t times (i.e. this) The state vector obtained is sampled, M indicates the number of wireless access point, M > 1, state vector S_tLength be 3M；By state vector S_t Being input to pre-set trained neural network model, (neural network model uses deeply study to be instructed in advance Practice) in, neural network model exports the result vector Q that length is M, includes that M wireless access point is corresponding unrestrained in result vector Q Valuation, the corresponding roaming valuation of a wireless access point are swum, roaming valuation represents the client roaming switch to corresponding nothing A possibility that line access point, roaming valuation is bigger, and possibility is bigger；A random number (0≤random number≤1) is generated, according to this Random number, the current exploration coefficient of neural network model and the result vector selection target from M wireless access point wirelessly connect Access point, using the target wireless access points selected as the roaming candidate of client.

It should be noted that after client access wireless access point, due to by environmental change or network internal operation The influence of state change, the wireless access point currently connected may be unable to satisfy the use demand of client, it is therefore desirable to every Period regular hour samples the state vector of client, to judge whether client needs roaming switch wireless access Point, and after selecting target wireless access points, in order to avoid the operation of unnecessary roaming switch, can further judge Whether the target wireless access points and the wireless access point that client is currently connect are identical, if not identical, issue to client Road report operates so that client executes corresponding roaming switch according to the target wireless access points, if they are the same, then without to Client issues road report.

A kind of radio roaming control method provided by the embodiment of the present invention, by every the preset time cycle to client RSSI value, channel utilization and the noise at end are sampled, and accordingly obtain state vector, and state vector is input to and is trained Neural network model, obtain result vector, accordingly with selection target is wireless from all wireless access point according to result vector Roaming object of the access point as client, it is contemplated that client is changed institute's band by environmental change and network internal operating status The influence come, and handled using the neural network model after training, so as to according in environmental change and network Portion's operating status dynamic adjustment roaming candidate, improves the accuracy of roaming candidate, and improve the usage experience of client.

In a further advantageous embodiment, described according to preset random number, the current exploration of the neural network model Coefficient and the result vector selection target wireless access point from several described wireless access point, specifically include:

Specifically, in conjunction with above-described embodiment, according to the random number of generation, the current exploration coefficient of neural network model and Result vector is from M wireless access point when selection target wireless access point, first by the current exploration coefficient of neural network model It is compared with the random number of generation, if the current exploration coefficient of neural network model is greater than the random number generated, selects M Any one wireless access point in a wireless access point is as target wireless access points；If the current spy of neural network model Rope coefficient is no more than the random number generated, the then corresponding wireless access point conduct of roaming valuation maximum value in selection result vector Target wireless access points.

In another preferred embodiment, after determining target wireless access points, the method also includes:

Specifically, after sampling each time and determining target wireless access points, being needed according to public affairs in conjunction with above-described embodiment FormulaLinear attenuation update is carried out to current coefficient of exploring, corresponding exploration coefficient is most when sampling for the first time Greatly, it with the increase of sampling number, explores coefficient and is gradually reduced.

It should be noted that exploring coefficient has pre-set initial value ε_startWith end value ε_end, decline when exploring coefficient Reduce to end value ε_endWhen, it explores coefficient and keeps end value ε_endIt is constant, i.e., no longer exploration coefficient is updated.

It should be understood that when sampling for the first time, needing spy as much as possible when the embodiment of the present invention is implemented for the first time Various states in rope network, therefore the corresponding exploration coefficient of neural network model is maximum at this time, with the increase of sampling number, The degree of understanding of network state is increased, the target wireless access points determined according to the neural network model after training are more and more quasi- Really, then network state is explored without as much as possible, explores coefficient and be gradually reduced, neural network model can gradually by with Machine explores state and is changed into the optimal exploration state of execution.

In another preferred embodiment, the method also includes:

Obtain client current actual speed rate and link delay；

The embodiment of the present invention is a kind of method for optimizing update to above-mentioned neural network model, specifically, in conjunction with upper Embodiment is stated, it is current that handling capacity actual measurement between the wireless access point first with client and currently connected obtains client Actual speed rate (unit Mbps) and link delay (unit ms) are calculated according to the actual speed rate of acquisition and link delay and are obtained Reward parameter R_t+1, and calculate and obtain sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t, then by sample data (S_t, A_t, R_t+1, S_t+1) and the corresponding weight parameter W of the sample data_tIt stores in pre-set experience replay pond, according to warp The weight probability distribution for all sample datas tested in playback pond chooses several sample datas as sample set, thus according to The sample set and back-propagation algorithm of selection optimize neural network model.

It should be noted that the initial value in experience replay pond is 0, and it is provided with certain threshold value, each time by sample Data (S_t, A_t, R_t+1, S_t+1) and the corresponding weight parameter W of sample data_tAfter storing experience replay pond, it is also necessary to further sentence Whether the quantity of the sample data stored in disconnected experience replay pond reaches threshold value, if reached, according to the principle of first in first out Store at first sample data and its corresponding weight parameter are deleted from experience replay pond.

As an improvement of the above scheme, described calculated according to the actual speed rate and the link delay obtains reward parameter R_t+1With sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t, it specifically includes:

According to formula R_t+1=(log10S- δ_delay×D)×(1-δ_handoff×1{A_t≠A_t-1) calculate to obtain and reward parameter R_t+1；Wherein, S indicates actual speed rate, and D indicates link delay, δ_delayIndicate link delay specific gravity, δ_handoffIndicate roaming switch Punishment, 1 { A of function_t≠A_t-1Indicate the target wireless access points A selected when the t times sampling_tIt is selected when with the t-1 times sampling Target wireless access points A_t-1When different, 1 { A_t≠A_t-1}=1, the target wireless access points A selected when the t times sampling_tWith The target wireless access points A selected when the t-1 times sampling_t-1When identical, 1 { A_t≠A_t-1}=0；

According to formulaIt calculates and obtains sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t；Wherein, S indicates actual speed rate, S_theorIndicate the Theoretical Rate that the specifications parameter of the wireless access point connected according to client obtains.

It should be noted that 1 { A of function_t≠A_t-1In the operation of client executing roaming switch, value is 1, in client Value is 0 when being not carried out roaming switch operation, can pass through formula (1- δ_handoff×1{A_t≠A_t-1) prize to Roaming control Encourage parameter R_t+1It punishes, to avoid frequently roaming.

As an improvement of the above scheme, the weight probability distribution according in the experience replay pond chooses several samples Notebook data is specifically included as sample set:

Specifically, E sample data is stored in experience replay pond in conjunction with above-described embodiment, each sample data pair A weight parameter is answered, according to formulaIt can calculate and obtain each of experience replay pond sample data Weight probability, then selected from E sample data weight probability meet preset condition (such as corresponding weight probability be greater than it is certain Weight probability threshold value sample data) sample set of several sample datas as optimization neural network model.

As an improvement of the above scheme, the neural network model include input layer, basal layer, valuation layer, decision-making level and Polymer layer,

As shown in connection with fig. 2, be a kind of neural network model provided by the invention a preferred embodiment structural block diagram, Neural network model in Fig. 2 includes input layer 1, basal layer 2 and basal layer 3 (parameter is indicated with θ), valuation layer 4 and valuation layer 5 (parameter is indicated with α), decision-making level 6 and decision-making level 7 (parameter is indicated with β), polymer layer 8, each layer have one or more nerves Member, each neuron add nonlinear activation function to constitute by linear combination, to can fit by multiple-layer stacked various Complicated functional relation, wherein 2 layers to 5 layers are all made of full connection plus ReLU activation primitive, 6 layers and 7 layers using full articulamentum, 6 Layer and 7 layers of difference output state cost function V (S；θ, β) and advantage function G (S, A；θ, α), 8 layers according to formulaIt is calculated, wherein S represents state vector, and A represents the target wireless access points chosen.

Wherein, state value function stand current state S_tValue assessment, advantage function represent in current state S_tUnder, The target wireless access points A of selection_tBehavior memory assessment, the main function that the two is separated is the valuation reduced between the two Mutually by being influenced, in roaming scence, particularity is the network state and current line of next sampling time node Not to be directly linked, if network is in the state far from roaming switch boundary, state value function will play leading make at this time With；And when network is in close to the state on roaming switch boundary, advantage function plays a leading role, because at this time selection is suitable Roaming switch target will directly affect subsequent reward parameter.

Specifically, in conjunction with above-described embodiment, using the sample set and back-propagation algorithm of selection, according to formula Update is optimized to the parameter θ of basal layer, the parameter alpha of valuation layer and the parameter beta of decision-making level, wherein α_learnIndicate nerve net Coefficient, θ are lost in the learning rate of network, γ expression_t ^-,α_t-,β_t ^-It is every to use θ by τ training_t,α_t,β_tIt updates once, i.e., it is every to pass through τ times Training just updates the parameter of neural network in the neural network model used in above-mentioned roaming decisions, is made by parameter update It is more accurate to the prediction of target wireless access points to obtain neural network model.

It should be noted that in neural network, the number of each specific neuron of layer can according to actual needs into Row setting, for example, the number of the wireless access point of networking is M, then in neural network model shown in Fig. 2, input layer 1 The number of neuron is 3M, and the number of the neuron of basal layer 2 and basal layer 3 is 2M, the neuron of valuation layer 4 and valuation layer 5 Number be M, the number of the neuron of decision-making level 6 is 1, and the number of the neuron of decision-making level 7 is M, the result that polymer layer 8 exports The dimension of vector Q is M.

You need to add is that needing before using deeply learning training neural network model to deeply The relevant parameter of habit is initialized, and ginseng is shown in Table 1, and is the Initialize installation of deeply learning parameter, wherein M is indicated Network topology interstitial content influences the size of other deeply learning parameters and neural network model；τ indicates neural network ginseng Examine the update cycle of parameter, i.e., it is every that neural network parameter is just updated to nerve net used in roaming decisions by 20 wheel training In network model, the training stability of deeply learning system can be increased, accelerate convergence speed；Losing coefficient gamma is prize A part of function design is encouraged, indicates that roaming decisions more lay particular emphasis on current network state, while future network status also has It is certain to influence；Learning rate α_learnIndicate the parameter renewal speed of the every wheel training of neural network；Training sample set size N indicates every wheel The quantity for the sample data that training uses, every wheel training are able to ascend trained stability according to weight probability distribution stochastical sampling； Link delay specific gravity δ_delayIt is bigger, it is bigger to represent the shared specific gravity rewarded of delay；Roaming switch punishes δ_handoffIt is bigger, represent hair At the time of raw roaming switch, the discount of obtained reward is bigger.

Table is arranged in 1 deeply learning parameter of table

Parameter	Numerical value
		Wireless access point number M	It is provided by network topology
It is initial to explore coefficient ε_start	1
		Terminate to explore coefficient ε_start	0.001
Explore coefficient attenuation number of iterations ε_decay	500×M
		Frequency of training K	max[(ε_decay+1200),(ε_decay+1500)]
Neural network reference parameter update cycle τ	20
		Lose coefficient gamma	0.9
Learning rate α_learn	0.005
		Experience replay pond size E	1800
Training sample set size N	64
		Link delay specific gravity δ_delay	0.1Kb/ms²
Roaming switch punishes δ_handoff	0.1

It should be understood that the parameter setting in table 1 is in addition to this kind of preferred Initialize installation mode may be used also To there is other a variety of combinations of values, the embodiment of the present invention is not especially limited.

The embodiment of the invention also provides a kind of radio roaming control devices, can be realized described in any of the above-described embodiment All processes of radio roaming control method, the technical effect difference of effect and the realization of modules, unit in device It is corresponding identical as the technical effect of effect and the realization of radio roaming control method described in above-described embodiment, it is no longer superfluous here It states.

It is shown in Figure 3, it is a kind of structure of a preferred embodiment of radio roaming control device provided by the invention Block diagram, described device are suitable for the network being made of several wireless access point；Described device includes:

State vector obtains module 11, for sampling the state vector for obtaining client every the preset time cycle；Its In, in the state vector include respectively the RSSI value of client corresponding with each wireless access point, channel utilization and Noise；

Result vector obtains module 12, for being obtained according to the neural network model after the state vector and preset training Obtain result vector；It wherein, include the corresponding roaming valuation of several described wireless access point in the result vector；

Roaming candidate selecting module 13, for being according to the current exploration of preset random number, the neural network model Several and result vector selection target wireless access point from several described wireless access point, the Target Wireless is connect Roaming candidate of the access point as client.

Preferably, the roaming candidate selecting module 13 specifically includes:

Parameter judging unit, for judging it is preset random whether the current exploration coefficient of the neural network model is greater than Number；

First roaming candidate selecting unit, if being greater than preset random number for currently exploring coefficient, if selection is described Any wireless access point in dry wireless access point is as the target wireless access points；

Second roaming candidate selecting unit, if for currently explore coefficient be not more than preset random number, selection described in The corresponding wireless access point of roaming valuation maximum value in result vector is as the target wireless access points.

Preferably, described device further include:

Coefficient updating module is explored, for according to formulaCurrent coefficient of exploring is updated； Wherein, ε_tIndicate corresponding exploration coefficient, ε when the t times sampling_t+1Indicate corresponding exploration coefficient, ε when the t+1 times sampling_start It indicates initial and explores coefficient, c indicates to terminate to explore coefficient, ε_decayIt indicates to explore coefficient attenuation number of iterations.

Preferably, described device further include:

Network data acquisition module, for obtaining the current actual speed rate of client and link delay；

Reward and weight calculation module obtain reward parameter for calculating according to the actual speed rate and the link delay R_t+1With sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t；Wherein, t indicates sampling number, S_tIt indicates to adopt for the t times The state vector of sample acquisition, S_t+1Indicate the state vector that the t+1 times sampling obtains, A_tIndicate the target selected when the t times sampling Wireless access point；

Sample data memory module is used for the sample data (S_t, A_t, R_t+1, S_t+1) and corresponding weight parameter W_tIt deposits It stores up in preset experience replay pond；

Sample set chooses module, for choosing several sample numbers according to the weight probability distribution in the experience replay pond According to as sample set；

Model optimization module, it is excellent for being carried out according to the sample set and back-propagation algorithm to the neural network model Change.

Preferably, the reward and weight calculation module specifically include:

Parameter calculation unit is rewarded, for according to formula R_t+1=(log₁₀S-δ_delay×D)×(1-δ_handoff×1{A_t≠ A_t-1) calculate to obtain and reward parameter R_t+1；Wherein, S indicates actual speed rate, and D indicates link delay, δ_delayIndicate link delay ratio Weight, δ_handoffIndicate roaming switch punishment, 1 { A of function_t≠A_t-1Indicate the target wireless access points selected when the t times sampling A_tWith the target wireless access points A selected when the t-1 times sampling_t-1When different, 1 { A_t≠A_t-1}=1 is selected when the t times sampling Target wireless access points A_tWith the target wireless access points A selected when the t-1 times sampling_t-1When identical, 1 { A_t≠A_t-1}=0；

Weight parameter computing unit, for according to formulaIt calculates and obtains sample data (S_t, A_t, R_t+1, S_t+1) Corresponding weight parameter W_t；Wherein, S indicates actual speed rate, S_theorIndicate the specification of the wireless access point connected according to client The Theoretical Rate of gain of parameter.

Preferably, the sample set is chosen module and is specifically included:

Weight probability calculation unit, for according to formulaCalculate each of experience replay pond sample The weight probability of notebook data；Wherein, E indicates the quantity of the sample data in the experience replay pond, E >=1, P_jIt indicates j-th The weight probability of sample data, j=1,2, E, W_iIndicate the corresponding weight parameter of i-th of sample data, i=1, 2, E；

Sample set selection unit, for selecting weight probability to meet several sample datas of preset condition as the sample This collection.

Preferably, the neural network model includes input layer, basal layer, valuation layer, decision-making level and polymer layer,

Then, the model optimization module specifically includes:

Model optimization unit, for the basis according to the sample set and back-propagation algorithm to the neural network model Layer, valuation layer and the parameter of decision-making level optimize.

The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium includes The computer program of storage；Wherein, where the computer program controls the computer readable storage medium at runtime Equipment executes radio roaming control method described in any of the above-described embodiment.

It is shown in Figure 4 the embodiment of the invention also provides a kind of terminal device, it is that a kind of terminal provided by the invention is set The structural block diagram of a standby preferred embodiment, the terminal device include processor 10, memory 20 and are stored in described In memory 20 and it is configured as the computer program executed by the processor 10, the processor 10 is executing the calculating Radio roaming control method described in any of the above-described embodiment is realized when machine program.

Preferably, the computer program can be divided into one or more module/units (such as computer program 1, meter Calculation machine program 2), one or more of module/units are stored in the memory 20, and by The processor 10 executes, to complete the present invention.One or more of module/units, which can be, can complete specific function Series of computation machine program instruction section, the instruction segment is for describing execution of the computer program in the terminal device Journey.

The processor 10 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc., general processor can be microprocessor or the processor 10 is also possible to any conventional place Device is managed, the processor 10 is the control centre of the terminal device, utilizes terminal device described in various interfaces and connection Various pieces.

The memory 20 mainly includes program storage area and data storage area, wherein program storage area can store operation Application program needed for system, at least one function etc., data storage area can store related data etc..In addition, the memory 20 can be high-speed random access memory, can also be nonvolatile memory, such as plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card and flash card (Flash Card) etc., or The memory 20 is also possible to other volatile solid-state parts.

It should be noted that above-mentioned terminal device may include, but it is not limited only to, processor, memory, those skilled in the art Member does not constitute the restriction to terminal device it is appreciated that Fig. 4 structural block diagram is only the example of above-mentioned terminal device, can be with Including perhaps combining certain components or different components than illustrating more or fewer components.

To sum up, a kind of radio roaming control method, device, computer readable storage medium provided by the embodiment of the present invention And terminal device, it has the advantages that

(1) it may cause network internal operating status as time goes by change, can be run according to network internal State adjusts the roaming switch target of client in real time, improves the usage experience of client；

(2) environment changing factor can be captured and dynamic adjusts roaming policy, adapt to different environment；

(3) compared to traditional loaming method based on RSSI threshold value, better net can be obtained under above-mentioned complex environment Network rate and lower link delay；

(4) compared to traditional loaming method based on RSSI threshold value, it is contemplated that noise, channel utilization are to channel circumstance Influence, and feature is further extracted by neural network, is conducive to accurately find optimal roaming switch target；

(5) by way of combining neural network and experience replay, environmental history can be remembered, helps to mention The behavioural characteristic at preceding client perception end, so as to optimize roaming opportunity；

(6) actual speed rate current according to client is to sample data (S_t, A_t, R_t+1, S_t+1) assign weight parameter W_t, make It is higher to obtain the probability sampled closer to the sample data of real velocity, so as to improve the accurate of neural network model valuation Degree.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims

1. a kind of radio roaming control method, which is characterized in that the method was suitable for being made of several wireless access point Network；The described method includes:

The state vector for obtaining client is sampled every the preset time cycle；Wherein, in the state vector include respectively with RSSI value, channel utilization and the noise of the corresponding client of each wireless access point；

Result vector is obtained according to the neural network model after the state vector and preset training；Wherein, the result to It include the corresponding roaming valuation of several described wireless access point in amount；

According to preset random number, the current exploration coefficient of the neural network model and the result vector from it is described several Selection target wireless access point in wireless access point, using the target wireless access points as the roaming candidate of client.

2. radio roaming control method as described in claim 1, which is characterized in that it is described according to preset random number, it is described The current exploration coefficient and the result vector of neural network model selection target from several described wireless access point are wireless Access point specifically includes:

If currently exploring coefficient is greater than preset random number, any wireless access in several described wireless access point is selected Point is used as the target wireless access points；

If currently exploring coefficient is not more than preset random number, select the roaming valuation maximum value in the result vector corresponding Wireless access point as the target wireless access points.

3. radio roaming control method as described in claim 1, which is characterized in that after determining target wireless access points, The method also includes:

According to formulaCurrent coefficient of exploring is updated；Wherein, ε_tIndicate corresponding when the t times sampling Exploration coefficient, ε_t+1Indicate corresponding exploration coefficient, ε when the t+1 times sampling_startIt indicates initial and explores coefficient, ε_endIndicate knot Beam explores coefficient, ε_decayIt indicates to explore coefficient attenuation number of iterations.

4. radio roaming control method as described in claim 1, which is characterized in that the method also includes:

Obtain client current actual speed rate and link delay；

It is calculated according to the actual speed rate and the link delay and obtains reward parameter R_t+1With sample data (S_t, A_t, R_t+1, S_t+1) Corresponding weight parameter W_t；Wherein, t indicates sampling number, S_tIndicate state vector, the S of the t times sampling acquisition_t+1Indicate t+1 The state vector that secondary sampling obtains, A_tIndicate the target wireless access points selected when the t times sampling；

By the sample data (S_t, A_t, R_t+1, S_t+1) and corresponding weight parameter W_tIt stores in preset experience replay pond；

5. radio roaming control method as claimed in claim 4, which is characterized in that described according to the actual speed rate and described Link delay, which calculates, obtains reward parameter R_t+1With sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t, it specifically includes:

According to formula R_t+1=(log₁₀S-δ_delay×D)×(1-δ_handoff×1{A_t≠A_t-1) calculate to obtain and reward parameter R_t+1； Wherein, S indicates actual speed rate, and D indicates link delay, δ_delayIndicate link delay specific gravity, δ_handoffIndicate roaming switch punishment, 1 { A of function_t≠A_t-1Indicate the target wireless access points A selected when the t times sampling_tWith the target selected when the t-1 times sampling Wireless access point A_t-1When different, 1 { A_t≠A_t-1}=1, the target wireless access points A selected when the t times sampling_tWith the t-1 times The target wireless access points A selected when sampling_t-1When identical, 1 { A_t≠A_t-1}=0；

According to formulaIt calculates and obtains sample data (S_t, A_t, R_t+1, S_t+1) corresponding weight parameter W_t；Wherein, S is indicated Actual speed rate, S_theorIndicate the Theoretical Rate that the specifications parameter of the wireless access point connected according to client obtains.

6. radio roaming control method as claimed in claim 4, which is characterized in that described according in the experience replay pond Weight probability distribution chooses several sample datas as sample set, specifically includes:

According to formulaCalculate the weight probability of each of experience replay pond sample data；Wherein, E table Show the quantity of the sample data in the experience replay pond, E >=1, P_jIndicate the weight probability of j-th of sample data, j=1, 2 ..., E, W_iIndicate the corresponding weight parameter of i-th of sample data, i=1,2 ..., E；

7. radio roaming control method as claimed in claim 4, which is characterized in that the neural network model includes input Layer, basal layer, valuation layer, decision-making level and polymer layer,

Then, described that the neural network model is optimized according to the sample set and back-propagation algorithm, it specifically includes:

According to the sample set and back-propagation algorithm to the basal layer of the neural network model, the ginseng of valuation layer and decision-making level Number optimizes.

8. a kind of radio roaming control device, which is characterized in that described device was suitable for being made of several wireless access point Network；Described device includes:

State vector obtains module, for sampling the state vector for obtaining client every the preset time cycle；Wherein, described It include RSSI value, channel utilization and the noise of client corresponding with each wireless access point respectively in state vector；

Result vector obtains module, for obtaining result according to the neural network model after the state vector and preset training Vector；It wherein, include the corresponding roaming valuation of several described wireless access point in the result vector；

Roaming candidate selecting module, for according to preset random number, the neural network model current exploration coefficient and institute Result vector selection target wireless access point from several described wireless access point is stated, the target wireless access points are made For the roaming candidate of client.

9. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium includes the calculating of storage Machine program；Wherein, the equipment where the computer program controls the computer readable storage medium at runtime executes such as The described in any item radio roaming control methods of claim 1~7.

10. a kind of terminal device, which is characterized in that including processor, memory and store in the memory and matched It is set to the computer program executed by the processor, the processor is realized when executing the computer program as right is wanted Seek 1~7 described in any item radio roaming control methods.