CN108347744A

CN108347744A - A kind of equipment cut-in method, device and access control equipment

Info

Publication number: CN108347744A
Application number: CN201810053320.1A
Authority: CN
Inventors: 赵毅峰; 刘凯; 杨华裕; 黄联芬; 廖礼宇; 李馨; 张远见; 胡应添
Original assignee: Xiamen University; Comba Telecom Systems Guangzhou Co Ltd
Current assignee: Xiamen University; Comba Network Systems Co Ltd
Priority date: 2018-01-19
Filing date: 2018-01-19
Publication date: 2018-07-31
Anticipated expiration: 2038-01-19
Also published as: CN108347744B

Abstract

The embodiment of the invention discloses a kind of equipment cut-in method, device and access control equipment, wherein method includes：Obtain the apparatus access state of base station, if it is determined that apparatus access state is congestion state, access restriction parameter P values adjustment process is then executed, the P values obtained after each P values are adjusted are sent to each equipment under base station, until the apparatus access state of base station becomes not congestion state.So, when under the apparatus access state that base station is in congestion, by executing P values, the Congestion Level SPCC of base station can be effectively improved by adjusting P values with adjusting process dynamics, and due to including the experience of previous adjustment P values in intensified learning Q value matrixs, therefore determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values according to the Q values in intensified learning Q value matrixs during the adjustment of P values, to make base station quickly reach best access state.

Description

A kind of equipment cut-in method, device and access control equipment

Technical field

The present invention relates to a kind of wireless communication technology field more particularly to equipment cut-in method, device and access controls to set It is standby.

Background technology

Development with from modern communication technology to Internet of Things, MTC (machine type communication, machine type Communication) equipment largely increases severely, and quantity has been more than H2H (Human-to-Human, person to person's communication) equipment.These MTC are set Standby communication be smart city, intelligent grid development basis, but also existing cordless communication network is caused prodigious negative It carries.The access of MTC device can cause existing H2H equipment and to the stringenter M2M (Machine-to- of delay requirement Machine, inter-machine communication) time delay of equipment increases, and base station control can be caused single especially when access amount is very big, when serious Member cannot timely and effectively handle mass data, cause base station that cannot normally work for a long time.

In order to solve the random access congestion problems of MTC device, 3GPP (3rd Generation Partnership Project, third generation partner program) group is woven in TS22.011V11.0.0. (chapter 4.3.4) and proposes ACB (Access Class Barring, access style limitation) algorithm, ACB define 16 kinds of access styles, some of them access class Type is that the application of high priority retains.When network load is heavier, base station can using ACB parameters as a part for system information to Device broadcasts in cell, these parameters include the access probability and back off time of different access styles.The principle of ACB is base station An access level restriction parameter P (0≤P≤1) is set according to current network load situation.Each equipment is produced before a random access Random number between a raw 0-1, equipment random number are less than access level restriction parameter P values, carry out random access, if more than with Machine number then continues generation random number in subsequent time and is accessed again.So when large number of equipment pours in access, it can make net Network loading condition is eased, and is successfully accessed rate and handling capacity is optimized.However, existing access level restriction algorithm is dividing In class feedback procedure, processing data are more complicated, when large number of equipment pours in, it cannot be guaranteed that access network is in optimal shape State, and since the less adjustment for leading to P values of priori of the process of network adjustment access level restriction parameter P does not tend to It is optimal, convergence rate is slow, and make base station global access state it is poor, cannot be satisfied the delay requirement of each equipment.

In conclusion there is an urgent need for a kind of equipment cut-in methods at present, can not be moved to solve existing radio access technology State adjusts P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.

Invention content

A kind of equipment cut-in method of present invention offer, device and access control equipment, to solve existing random access Technology can not dynamically adjust P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.

A kind of equipment cut-in method provided in an embodiment of the present invention, including：

Obtain the apparatus access state of base station；

If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and will be every The P values obtained after secondary P values adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not Congestion state；

Wherein, the action that each P values adjustment uses is determined according to the Q values in intensified learning Q value matrixs.

Optionally, the apparatus access state for obtaining base station, including：

According to the current application access device number in base station and optimal access device number, the equipment access shape of the base station is determined State.

Optionally, the execution P values adjust process, and the P values obtained after each P values are adjusted are sent under the base station Each equipment, including：

According to the apparatus access state S of the base station before kth time P value adjustment^kWith the Q in the intensified learning Q value matrixs Value determines the action Y that the kth time P value adjustment uses^k(l), wherein the k is positive integer；

Using the action Y^k(l) the P values are adjusted, and the P values after adjustment are sent to each equipment；

Obtain the apparatus access state S of the base station after the kth time P values adjust^k+1, and shape is accessed according to the equipment State S^kWith the apparatus access state S^k+1, update in the intensified learning Q value matrixs in the apparatus access state S^kLower use The action Y^k(l) corresponding Q values.

Optionally, the apparatus access state S according to the preceding base station of kth time P value adjustment^kWith the intensified learning Q Q values in value matrix determine the action Y that the kth time P value adjustment uses^k(l), including：

Determination selects the apparatus access state S from preset optional action^kThe corresponding maximum action of Q values it is general Rate；Wherein, the probability and the k positive correlations；

According to the probability and preset each optional action, the action Y is determined^k(l)。

Optionally, described according to the apparatus access state S^kWith the apparatus access state S^k+1, update the extensive chemical It practises in Q value matrixs in the apparatus access state S^kThe action Y of lower use^k(l) corresponding Q values, including：

It determines in the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding transfer gain；

According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixs^k+1Corresponding maximum Q Value, the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, the apparatus access state S after being updated^k It is lower to use the action Y^k(l) corresponding Q values.

Optionally, according to apparatus access state S described in the transfer gain, the intensified learning Q value matrixs^k+1It is corresponding Maximum Q values, the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, the equipment connects after being updated Enter state S^kIt is lower to use the action Y^k(l) corresponding Q values, meet following formula：

Wherein, Q (S^k,Y^k(l)) it is in the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, α are Studying factors, and 0<α<1, R_s(S^k,Y^k(l)) it is in the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding turn Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional action^k+1It is right The maximum action of Q values answered,For the apparatus access state S^k+1Corresponding maximum Q values.

Optionally, in the apparatus access state S in the update intensified learning Q value matrixs^kIt is lower to be moved using described Make Y^k(l) after corresponding Q values, further include：

If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S¹Convergence, then in the kth time P value tune In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S¹For the base station before the 1st P values adjustment Apparatus access state.

Optionally it is determined that in preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment accesses shape State S¹The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment；

If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q Value matrix is relative to the apparatus access state S¹Convergence.

Optionally, the condition of convergence specifically includes：

Wherein, Yⁿ(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times The action of whole use, Y^n-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Y^n-2(l) For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.

Based on same inventive concept, the present invention also provides a kind of equipment access devices, including：

Acquisition module, the apparatus access state for obtaining base station；

Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P value tune It is had suffered journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until The apparatus access state of the base station is not congestion state；Wherein, the action that each P values adjustment uses is according to extensive chemical Practise what the Q values in Q value matrixs determined.

Optionally, the acquisition module is specifically used for：

It is connect according to the current application access device number in the base station and optimal access device number, the equipment for determining the base station Enter state.

Optionally, the processing module is specifically used for：

Using the action Y^k(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module described each A equipment；

The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition module^k+1, and root According to the apparatus access state S^kWith the apparatus access state S^k+1, update in the intensified learning Q value matrixs in the equipment Access state S^kThe action Y of lower use^k(l) corresponding Q values.

Optionally, the processing module is specifically additionally operable to：

Optionally, the processing module is specifically additionally operable to the apparatus access state S after being updated by following formula^k It is lower to use the action Y^k(l) corresponding Q values：

Optionally, the processing module is additionally operable to：

Optionally, the processing module is specifically additionally operable to：

Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S¹Phase The nearest values of the P three times adjustment of the same and relatively described kth time P values adjustment；

Optionally, the condition of convergence specifically includes：

Another embodiment of the present invention provides a kind of access control equipment comprising memory and processor, wherein described Memory is for storing program instruction, and the processor is for calling the program instruction stored in the memory, according to acquisition Program execute any of the above-described kind of method.

Another embodiment of the present invention provides a kind of computer storage media, and the computer-readable recording medium storage has Computer executable instructions, the computer executable instructions are for making the computer execute any of the above-described kind of method.

Equipment cut-in method provided in an embodiment of the present invention includes the apparatus access state for obtaining base station, however, it is determined that described Apparatus access state is congestion state, then executes access restriction parameter P values adjustment process, the P values obtained after each P values are adjusted The each equipment being sent under the base station, until the apparatus access state of the base station becomes not congestion state.In this way, in base When standing under the access state in congestion, by executing P values, gathering around for base station can be effectively improved by adjusting P values with adjusting process dynamics Plug degree, and due to including the experience of previous adjustment P values, basis during the adjustment of P values in intensified learning Q value matrixs Q values in intensified learning Q value matrixs determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values, to make Base station quickly reaches best equipment access state.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these attached drawings His attached drawing.

Fig. 1 is the system architecture diagram that equipment cut-in method provided in an embodiment of the present invention is applicable in；

Fig. 2 is the flow diagram corresponding to a kind of equipment cut-in method provided in an embodiment of the present invention；

Fig. 3 is the flow diagram corresponding to the method for the access state of the determination base station provided in the embodiment of the present invention；

Fig. 4 is the resource parameters schematic diagram of base station provided in an embodiment of the present invention；

Fig. 5 is contention access number of devices and the relational graph being successfully accessed between probability in the embodiment of the present invention；

Fig. 6 is contention access number of devices and the relational graph being successfully accessed between number of devices in the embodiment of the present invention；

Fig. 7 is the schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention；

Fig. 8 is a kind of schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of equipment access device provided in an embodiment of the present invention；

Figure 10 is a kind of structural schematic diagram of access control equipment provided in an embodiment of the present invention.

Specific implementation mode

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into One step it is described in detail, it is clear that described embodiment, only a part of the embodiment of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.

The embodiment of the present invention is described in further detail with reference to the accompanying drawings of the specification.

The equipment cut-in method provided in the embodiment of the present invention is applied in the random access field of equipment for machine type communication.Figure 1 system architecture diagram being applicable in for equipment cut-in method provided in an embodiment of the present invention, connects as shown in Figure 1, system includes equipment Enter device 101, base station 102 and a plurality of types of equipment for machine type communication (as illustrated in the drawing 103 to 110).

Wherein, the base station can be 2G, 3G, 4G, LTE-M (Long Term Evolution Machine to Machine, the technology of Internet of things based on long term evolution), NB-IOT (Narrow Band Internet of Things, narrowband Internet of Things) etc. multiple types communication system or the base station in Internet of things system, the present invention this is not particularly limited.

The equipment access device can be independently of the smart machine of base station setting, the wireless communication of the device and base station Unit (such as RRU (Radio Remote Unit, Remote Radio Unit)) foundation has communication connection, is obtained by the communication connection The resource and access situation of base station.Alternatively, the equipment access device is also embedded in the RRU of inside of base station, directly acquire The resource and access situation, the present invention of base station are not particularly limited this.

The equipment for machine type communication is such as smart mobile phone, tablet computer communication equipment, can also be as intellectual water meter, A plurality of types of internet of things equipment, the invention such as ammeter, parking management module are not particularly limited this, these machine type communications It is provided with wireless communication module in equipment, passes through wireless network and base station communication.

Fig. 2 shows the flow diagrams corresponding to a kind of equipment cut-in method provided in the embodiment of the present invention, this sets Standby cut-in method is specifically executed by the equipment access device in above system framework, as shown in Fig. 2, the method includes：

Step S201:Obtain the apparatus access state of base station；

Step S202:If it is determined that the apparatus access state is congestion state, then executes access restriction parameter P values and adjusted Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station, until the access shape of the base station State becomes not congestion state；Wherein, the action that each P values adjustment uses is true according to the Q values in intensified learning Q value matrixs Fixed.

In this way, when under the apparatus access state that base station is in congestion, P is adjusted with adjusting process dynamics by executing P values Value can effectively improve the Congestion Level SPCC of base station, and due to including the warp of previous adjustment P values in intensified learning Q value matrixs It tests, therefore determines that the action used in adjustment P values every time can be effective according to the Q values in intensified learning Q value matrixs during the adjustment of P values The convergence rate for improving P values, to make base station quickly reach best access state, overall efficiency higher.

In step s 201, equipment access device can obtain base station by the communication connection of the wireless communication unit with base station Current access situation and access capability, so that it is determined that going out apparatus access state.

Specifically, the apparatus access state of base station is true according to current application access device number and optimal access device number Fixed, wherein application access device number is practical contention access number of devices in current system, optimal access device number is to be currently Optimal contention access number of devices in system, as shown in figure 3, determining that apparatus access state may include following step S301 to step S303：

Step S301：Obtain current application access device number and optimal access device number；

Optimal access device number, i.e. optimal contention access equipment base station are practical to access the number of devices competed when maximum device number Amount, the access capability current for characterizing base station.The optimal access device number is equipment access device according to the current money in base station What source situation determined.Fig. 4 is the schematic diagram of the resource parameters of base station provided in an embodiment of the present invention, as shown in figure 4, the money of base station Source situation includes following any one or more resource parameters：

Can with leading number of codes, the maximum retransmission of lead code, the access request arrival rate of each equipment, connect at random Input time slot assignment period, backoff parameter, random access response length of window.

In the embodiment of the present invention, equipment access device can determine current practical competition by the communication connection with base station The current resource situation of access device number and base station.Then, by carrying out emulation mould to the current resource situation in base station It is quasi-, obtain current optimal contention access number of devices.

It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 5 and is successfully accessed between probability Relational graph is successfully accessed probability and is maintained at 1 as shown in figure 5, when applying for that access device number is less than maximum contention access device number It is constant, with the increase of practical contention access number of devices, when practical contention access number of devices is more than or equal to maximum contention number of devices When, it is successfully accessed probability and reduces rapidly.It is equal to the turning point of maximum contention access device number in practical contention access number of devices When on position, the change rate for being successfully accessed probability is maximum, and with the increase of practical contention access number of devices, change rate gradually drops It is low, and finally level off to 0.

It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 6 and is successfully accessed between number of devices Relational graph, as shown in fig. 6, practical contention access number of devices be less than maximum contention access device number when, be successfully accessed equipment Number linearly increases with the increase of practical contention access number of devices, with the increase of practical contention access number of devices, works as reality After contention access number of devices is more than or equal to maximum contention access device number, number of devices is successfully accessed with practical contention access equipment Several increase and reduce, and be successfully accessed number of devices and be in maximum contention access device number in practical contention access number of devices When near turning point, fall is maximum.

In conjunction with Fig. 5 and Fig. 6, the optimal contention access number of devices in the embodiment of the present invention is more than maximum contention access device Number is the practical contention access number of devices that base station is successfully accessed in the case where setting is successfully accessed rate, and it is smaller which is successfully accessed rate In 1.That is, optimal contention access number of devices and maximum contention access device number are numerically very close to being in Fig. 5 and figure Contention access number of devices is attached close to turning point more than in the sloping portion after maximum contention access device number in curve shown in 6 Close a certain concrete numerical value.

In the embodiment of the present invention, setting is successfully accessed rate and can be specifically arranged by those skilled in the art, the present invention to this not Do concrete restriction.It is alternatively possible to which setting success rate is taken 98.9% or other close numerical value.

Step S302：According to practical contention access number of devices and optimal contention access number of devices, Congestion Level SPCC is calculated Value.

In the embodiment of the present invention, congestion degree value can be calculated by following formula one：

Wherein, P is congestion degree value, and N is practical contention access number of devices, N₀For optimal contention access number of devices.

According to formula one as can be seen that congestion degree value can regard that practical contention access number of devices deviates optimal competition as The weighted value of access device number, when practical contention access number of devices is less than optimal contention access number of devices, base station is not gathered around Plug, it is 1 to be successfully accessed probability, and the equipment of all application accesses can be successfully accessed base station, thus, it is calculated by above-mentioned formula one Obtained congestion degree value is negative value.And when practical contention access number of devices is more than optimal contention access number of devices, base station exists A degree of congestion is successfully accessed probability and is less than or equal to 1, since the number of devices of application access has been more than the access energy of base station Power, thus, can have a certain number of equipment can not be successfully accessed, and the congestion degree value being calculated by above-mentioned formula one is Positive value.

Step S303：The congestion degree value section belonging to the congestion degree value is determined, according to the congestion degree value area Between and the correspondence between each congestion degree value section and apparatus access state, determine the base station equipment access State.

All possible apparatus access state is preset in the embodiment of the present invention, in base station, each equipment accesses shape State corresponds to a congestion degree value section.Each congestion degree value in the same congestion degree value section due to mutually it Between gap it is smaller, it is believed that be belong to the same congestion level, and an apparatus access state of base station correspond to a congestion Rank, thus, it can judge the current congestion level in base station according to the congestion degree value being calculated by above-mentioned formula one, from And determine the apparatus access state of base station.

If for example, all possible congestion level in base station be [0, L], l ∈ [0, L], wherein L indicate base station highest gather around Rank is filled in, l is some congestion level therein, and l and L are the positive integer more than or equal to 1.Each congestion level is corresponding One apparatus access state of base station indicates base station not congestion, i.e., corresponding apparatus access state when congestion level is 0 Not congestion state is also belonged to, when congestion level is more than 0, indicates that congestion, corresponding apparatus access state category has occurred in base station In congestion state.

By above-mentioned formula one it can also be seen that the congestion degree value being calculated is continuous numerical value, and each congestion grade It is other, it is positive integer, therefore, preset each congestion degree value in base station and apparatus access state (i.e. congestion grade can be passed through Correspondence between not) quantifies the congestion degree value being calculated, so that it is determined that going out the current equipment access in base station State.

In this way, base station has altogether L+1 kind congestion levels, the i.e. possible apparatus access state of L+1 kinds, the embodiment of the present invention In, those skilled in the art can according to actual needs set the quantity of apparatus access state (i.e. the quantity of congestion level) It sets, the present invention is not particularly limited this.

In addition, in the embodiment of the present invention, when congestion level is 0, corresponding congestion degree value section is [- ∞, 0], and its The range in the corresponding congestion degree value section of his congestion level can be specifically arranged by those skilled in the art, and the present invention does not do this Concrete restriction.For example, the corresponding congestion degree value section of the congestion level of each non-zero can be set to the area of size equalization Between, or can also to set the corresponding congestion degree value section of the congestion level of each non-zero to size unequal, but with The increase section size of congestion level also gradually smaller section, i.e. congestion level is higher, corresponding congestion degree value section Range is with regard to smaller.

In the embodiment of the present invention, P values when equipment access device is only in base station the apparatus access state of congestion carry out Adjustment, therefore, in step s 102, however, it is determined that apparatus access state is congestion state, then triggers and execute access restriction parameter P values Adjustment process.And if it is determined that apparatus access state is not congestion state, then current P values are not adjusted.Wherein, above-mentioned congestion shape State refers to the access state that corresponding congestion level belongs to [1, L], without congestion state refer to corresponding congestion level is 0 to set Standby access state.

Since different moments apply for that the number of devices of access base station may be different, the access state of base station may be real-time change Change, therefore, the equipment access device in the embodiment of the present invention can periodically go to obtain the access state of base station, thus When determining that the access state of base station is congestion, triggering P values adjust process.

In the embodiment of the present invention, equipment access device can determine that current apparatus access state is congestion state Afterwards, the P values for just going to acquisition base station current can also be equipment access device by the communication connection with base station, determine that base station is worked as It is got while preceding apparatus access state, the present invention is not particularly limited this.

Specifically, above-mentioned P values adjustment process includes multiple P values adjustment, each P values adjustment can regard whole P values adjustment as One cycle in the process.The P values obtained after adjustment are sent to each equipment of access base station, directly in each P values adjustment Apparatus access state to base station becomes not congestion state.

The flow diagram of the exemplary P values adjustment processes given in the embodiment of the present invention of Fig. 7, as shown in fig. 7, P values Adjustment process specifically comprises the following steps S701 to step S704：

Step S701：According to the apparatus access state S of the base station before kth time P value adjustment^kWith the intensified learning Q values Q values in matrix determine the action Y that the kth time P value adjustment uses^k(l), wherein the k is positive integer；

Step S702：Using the action Y^k(l) it adjusts the P values, and the P values after adjustment is sent to described each set It is standby；

Step S703：Obtain the apparatus access state S of the base station after the kth time P values adjust^k+1, and set according to described Standby access state S^kWith the apparatus access state S^k+1, update in the intensified learning Q value matrixs in the apparatus access state S^kThe action Y of lower use^k(l) corresponding Q values；

Step S704：If it is determined that apparatus access state S^k+1For not congestion state, then terminates P values adjustment process, otherwise carry out + 1 P values adjustment of kth.

The embodiment of the present invention uses the thought of Q study to adjust P values, and P values adjust process and include repeatedly cycle for one Adjustment process.It is adjusted for a P value in cycle each time, the multiple P values in the secondary P values adjustment before basis adjust middle school The experience practised select this P value adjust change P values use action.

Intensified learning Q value matrixs and a behavior aggregate for being used for adjusting P values there are one being pre-set in equipment access device It closes.Wherein, intensified learning Q value matrixs are used for recording the experience learnt from the cycle of previous adjustment P values, intensified learning Q One apparatus access state of each line identifier base station in value matrix, each row mark base station adjust in the set of actions of P values One action, in intensified learning Q value matrixs the i-th row jth row in Q values indicate in state S_iIn lower selection set of actions J-th of action Y_jAdjust the corresponding Q values of P values.The higher expression of Q values is in state S_iSelection acts Y_jReach final goal (i.e. not congestion State) success rate is higher, i.e. and selection acts Y_jThe benefit higher of whole system afterwards.Intensified learning Q value matrixs are in above-mentioned P values tune It has suffered and is initialised 0 when journey starts to execute (i.e. first time P values adjustment before), often pass through a P values adjustment later, it all can be correspondingly Adjust Q values therein.

The set of actions for being used for adjusting P values in base station can be { Y (- H) ... Y (- 1), Y (0), Y (1) ..., Y (H) } Form, wherein each action corresponds to a P value adjustment amount, which can be positive number, negative, or 0.It can See, selection increase P values action be adjusted after can allow more equipment access, conversely, selection reduce P values action adjusted The quantity of the equipment of access can be reduced after whole.

In the embodiment of the present invention, those skilled in the art can be to the P value adjustment amounts corresponding to each action in set of actions Specifically it is arranged, the present invention is not particularly limited this.For example, the unit that P values can be adjusted be defined as 0.01, it is specified that It acts Y (1) to indicate P values increasing 0.01, action Y (- 1) indicates P values reducing 0.01, and so on.

It is clear, easy in order to describe, (i.e. kth time P values adjustment) is only adjusted with any secondary P values therein to introduce below Entire P values adjust process.

In the adjustment of P values each time, equipment access device can all adjust a P value, and the P values after adjustment are handed down to base Each equipment under standing can determine itself whether apply accessing after equipment receives the P values after adjustment, therefore, often through once following The apparatus access state of ring, base station also can correspondingly change.

In step s 701, in kth time P value adjustment, equipment access device adjusts setting for preceding base station according to kth time P values Standby access state S^kWith the Q values in intensified learning Q value matrixs, the action Y that kth time P value adjustment uses is determined^k(l).Wherein, k=1 When, apparatus access state S¹The apparatus access state obtained in as step S201 triggers the equipment access of P values adjustment process State and first time P value adjust the apparatus access state of preceding base station.

It is specific as follows：If intensified learning Q value matrixs do not have relative device access state S¹Convergence determines then according to the k Go out and selects apparatus access state S in preset each optional action^kThe probability of the corresponding maximum action of Q values.Then, according to The probability determined and preset each optional action determine action Y^k(l)。

Wherein, k refers to which time P value adjustment current kth time P values adjustment is during the adjustment of entire P values.

In the embodiment of the present invention, the optional action all same under all possible apparatus access state of base station.Therefore, on State apparatus access state S^kIt is each it is optional action be base station adjustment P values set of actions in each action.Certainly, ability Field technique personnel also can be specifically arranged corresponding optional action under the possible apparatus access state in base station, each access The corresponding optional action of state can be different due to the difference of its congestion level, and the present invention is not particularly limited this.

In the embodiment of the present invention, since intensified learning Q value matrixs are initialized to 0 when P values adjustment process starts, and it is rigid Intensified learning Q value matrixs after initialization can not be how the 1st P values adjustment selects action to be instructed, thus, the 1st An action in secondary P values adjustment in equipment access device random selection set of actions, which is used as, to be started, and is triggered the P values and was adjusted Journey.

Then, in order to obtain the experience of more adjustment P values, equipment access device will be as much as possible to each in set of actions A action carries out trial and error, thus, in P values adjustment later, equipment access device can still be randomly selected according to certain probability Action is chosen at the apparatus access state S of P values adjustment according to certain probability^kThe lower maximum action of Q values.It is gradual with k Increase, the P values adjustment experience for including in intensified learning Q value matrixs is more and more abundant more, thus it is maximum gradually to increase selection Q values That action probability, the corresponding probability for reducing random selection action.

As it can be seen that acting Y in selection in the embodiment of the present invention^k(l) when, by apparatus access state S^kEach optional action in Q It is worth maximum action as action Y^k(l) the k value positive correlations of probability and kth time P value adjustment, equipment access device can be in determination After the probability for going out to select the maximum action of Q values, in conjunction with apparatus access state S^kEach optional action, finally determine action Y^k(l)。

If intensified learning Q value matrixs relative device access state S¹Convergence, then it represents that in multiple P values adjustment before In learnt to by apparatus access state S¹It is adjusted to the optimal policy of not congestion state, then, in any secondary P later It will all be selected in preset each optional action in value adjustment, it is maximum to correspond to Q values for apparatus access state before any secondary P values adjustment That action, as the action used in any secondary P values adjustment.

In step S702, equipment access device is based on the action Y selected in kth time P value adjustment^k(l) P values are adjusted, The P values obtained after adjustment are equal to the current P values in base station and add selected action Y^k(l) corresponding P values adjustment amount, then will adjustment P values afterwards are sent to each equipment under base station.

After equipment receives the P values after the adjustment of base station, method in the prior art can be used and judge whether application access base It stands.I.e. equipment itself randomly generates a P value, and the P values that itself is generated are compared with the P values that base station issues, if itself The P values of generation are less than or equal to the P values that base station issues, then send application access request to base station, otherwise do not send application access and ask It asks, after waiting for a period of time, judges whether to apply for access base station again.

In step S703, equipment access device can be accessed according to the acquisition base station equipment described in above-mentioned steps S201 The method of state obtains the apparatus access state S of base station^k+1, apparatus access state S^k+1Refer to being adopted in kth time P value adjustment With action Y^k(l) after adjusting P values, the apparatus access state of base station is changed, the new apparatus access state that base station reaches. If apparatus access state S^k+1It is still congestion state, indicates that P values adjustment process continues, in this case, the equipment Access state S^k+1And+1 P value of kth adjusts the apparatus access state of preceding base station.

Get the apparatus access state S of base station^k+1Afterwards, equipment access device can also be according to apparatus access state S^kWith set Standby access state S^k+1, update in intensified learning Q value matrixs in apparatus access state S^kIt is lower to use action Y^k(l) corresponding Q values, tool Body includes：

It determines first in apparatus access state S^kLower selection acts Y^k(l) corresponding transfer gain.In the embodiment of the present invention, Equipment access device can determine the transfer gain according to preset transfer gain function, or can also be according to institute in Fig. 8 The transfer gain matrix shown determines that the transfer gain, the present invention are not particularly limited this.

By taking the transfer gain matrix in Fig. 8 as an example, each line identifier base station of the transfer gain is before kth time P value adjustment Apparatus access state S^k, each row mark is using action Y^k(l) the apparatus access state S that base station reaches after adjustment P values^k+1, and Numerical value in i-th row jth row indicates the apparatus access state of base station by S_iIt is transferred to state S_jCorresponding transfer gain.

As seen from Figure 8, it is corresponding when the apparatus access state of base station is transferred to high congestion rank by low congestion level Transfer gain is negative value, and the more high corresponding transfer gain of the congestion level for the apparatus access state being transferred to is smaller.Base station Apparatus access state when being transferred to low congestion level by high congestion rank corresponding transfer gain be positive value, and the equipment being transferred to The more low corresponding transfer gain of the congestion level of access state is bigger.If the apparatus access state of base station is constant after the adjustment of P values, Corresponding transfer gain is zero.

Then, Y is acted according to selection^k(l) transfer gain caused by, apparatus access state S in intensified learning Q value matrixs^k ⁺¹Corresponding maximum Q values, apparatus access state S^kUsing the action Y^k(l) corresponding Q values Q values are calculated by following formula Apparatus access state S after being updated^kThe corresponding action Y^k(l) Q values：

Wherein, S^kThe apparatus access state of preceding base station, Y are adjusted for kth time P values^k(l) it is to be selected in kth time P value adjustment First action；Q(S^k,Y^k(l)) it is in apparatus access state S^kIt is lower to use action Y^k(l) corresponding Q values, and the Q on the right of equal sign (S^k,Y^k(l)) apparatus access state S before representing matrix update^kRespective action Y^k(l) Q values, the Q (S on the equal sign left side^k,Y^k (l)) apparatus access state S after representing matrix update^kRespective action Y^k(l) Q values；α is Studying factors, it is value range Real number between (0,1), α is bigger, and the effect for indicating that reservation is trained before is fewer, more payes attention in current kth time P value adjustment Middle selection acts Y^k(l) caused return afterwards, i.e. selection act Y^k(l) what is reached after transfer gain caused by and update sets Standby access state S^k+1Q value of the Q values after matrix update in the ratio that accounts for it is higher, conversely, the smaller expressions of α are more paid attention to before The adjustment of P values in the experience that learns, that is, the ratio accounted in the Q values of Q values in the updated before updating is higher；R_s(S^k,Y^k(l)) For in apparatus access state S^kIt is lower to use action Y^k(l) corresponding transfer gain, γ are discount factor, and 0<γ<1, γ is bigger, More pay attention to experience, S^k+1For the new apparatus access state that base station reaches after kth time P value adjustment, Y (l) is in intensified learning Q Apparatus access state S in value matrix^k+1Each optional action in corresponding maximum Q values action,For Apparatus access state S^k+1Corresponding maximum Q values, i.e., in apparatus access state S^k+1The Q values of lower respective action Y (l).

In the embodiment of the present invention, those skilled in the art can be according to above-mentioned study in formula of the actual demand to calculating Q values The factor and the concrete numerical value of discount factor are reasonably arranged, and the present invention is not particularly limited this.

Due to apparatus access state S it can be seen from Fig. 8 combinations above-mentioned formula two^kFor congestion state, if the action of selection Y^k(l) it is the action for increasing P values, then base station will allow more equipment application access base stations, in this case, base station Apparatus access state deviates the case where not congestion state can be further serious, that is, adjusts the new apparatus access state S after P values^k+1's Congestion level can be got higher, and the transfer gain brought at this time is negative value, is equivalent to punishment；Correspondingly, if the action Y of selection^k(l) it is Reduce the action of P values, then base station will reduce the number of devices for allowing to apply for access base station, in this case, base station is set The case where standby access state deviates not congestion state can mitigate, that is, adjust the new apparatus access state S after P values^k+1Congestion grade It is not lower, the transfer gain brought at this time is positive value, is equivalent to reward.

In step S704, if using selected action Y through kth time P value adjustment^k(l) after adjusting P values, obtained base Stand the new access state S reached^k+1Be still congestion state, then continue kth+1 P value adjustment, otherwise, indicate incited somebody to action Apparatus access state S when the apparatus access state of base station is by P values adjustment process¹It is adjusted to not congestion state, at this time Cycle is jumped out, terminates P values and adjusts process.

It should be noted that after updating intensified learning Q value matrixs in step S703, but start executing step S704 Further include judging whether updated intensified learning Q value matrixs are convergent relative to the first access state recycle next time before Step.

As it was noted above, if intensified learning Q value matrixs are relative to apparatus access state S¹Convergence is indicated equipment Access state S¹The optimal policy for being adjusted to not congestion state has been found, and each Q values in intensified learning Q value matrixs have become It is close to stablize, therefore, select any secondary P values to adjust setting for preceding base station in any secondary P values adjustment after kth time P value adjustment Action of the maximum action of Q values as adjustment P values in each optional action of standby access state.

Specifically, whether intensified learning Q values connect relative to equipment after equipment access device judges update in the following way Enter state S¹Convergence：

First, before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S¹The nearest values of the P three times adjustment of the identical and relatively described kth time P values adjustment time；

Then, judge whether the action that this is used during P values adjust three times meets the following preset condition of convergence, if met The condition of convergence, then when illustrating to be transferred to new apparatus access state every time, amplitude of variation all very littles of selected action (are followed for each time The P value adjustment amounts of selected action are all closer in ring), at this time, it is believed that each Q values in intensified learning Q value matrixs are It is basically unchanged, updated intensified learning Q value matrixs are relative to apparatus access state S¹Convergence.

The preset condition of convergence meets following formula：

As can be seen that due to judging intensified learning Q value matrixs whether relative to apparatus access state S¹Convergence needs at least Apparatus access state is apparatus access state S before 3 P values adjustment of experience¹Cycle, thus, P values adjustment process 3 times In cycle, intensified learning Q value matrixs must be relative to apparatus access state S¹It does not restrain.

Moreover, because determining intensified learning Q value matrixs relative to apparatus access state S¹Each P values adjustment before convergence In, it can all judge whether to meet the condition of convergence after selection action or update reinforcing Q value matrixs, therefore, if in kth time P value adjustment Middle first time determines that intensified learning Q value matrixs meet convergence addressee, then the apparatus access state before the kth time P value adjustment One is set to apparatus access state S¹.In i.e. preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment connects Enter state S¹The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment may include kth time P value adjustment.

Certainly it is also possible to have passed through P values adjustment many times, matrix is not restrained still in intensified learning Q, equipment access device It is also provided with a maximum receive number K, i.e., if being still unsatisfactory for the condition of convergence after kth time P value adjustment, but k values are More than or equal to maximum receive number K, then it is assumed that intensified learning Q value matrixs are relative to apparatus access state S¹Convergence terminates Matrix training process, the maximum action of Q values in directly selecting intensified learning Q value matrixs in the values of P each time adjustment later P values are adjusted, are no longer acted with certain probability random selection.

Based on same inventive concept, the embodiment of the present invention also provides a kind of equipment access device, and Fig. 9 is that the present invention is implemented The structural schematic diagram of a kind of equipment access device provided in example, as shown in figure 9, the equipment access device 900 includes：

Acquisition module 901, the apparatus access state for obtaining base station；

Processing module 902, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P It is worth adjustment process, and obtained P values each are set by what transceiver module 903 was sent under the base station after each P values are adjusted It is standby, until the apparatus access state of the base station is not congestion state；Wherein, the action that each P values adjustment uses is root It is determined according to the Q values in intensified learning Q value matrixs.

Optionally, the acquisition module 901 is specifically used for：

Optionally, the processing module 902 is specifically used for：

Using the action Y^k(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module 903 described Each equipment；

The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition module 901^k+1, and According to the apparatus access state S^kWith the apparatus access state S^k+1, update and set described in the intensified learning Q value matrixs Standby access state S^kThe action Y of lower use^k(l) corresponding Q values.

Optionally, the processing module 902 is specifically additionally operable to：

Optionally, the processing module 902 is specific is additionally operable to the equipment access shape after being updated by following formula State S^kIt is lower to use the action Y^k(l) corresponding Q values：

Optionally, the processing module 902 is additionally operable to：

Optionally, the condition of convergence specifically includes：

Based on same inventive concept, the embodiment of the present invention also provides another access control equipment, and access control is set It is standby to be specifically as follows desktop computer, portable computer, smart mobile phone, tablet computer, personal digital assistant (Personal Digital Assistant, PDA) etc..As shown in Figure 10, which may include central processing unit (Center Processing Unit, CPU) 1001, memory 1002, input-output apparatus 1003 and bus system 1004 Deng.Wherein, input equipment may include keyboard, mouse, touch screen etc., and output equipment may include display equipment, such as liquid crystal Show device (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..

Memory may include read-only memory (ROM) and random access memory (RAM), and provide storage to processor The program instruction and data stored in device.In embodiments of the present invention, memory can be used for storing above equipment cut-in method Program.

Processor is by the program instruction for calling memory to store, and processor according to the program instruction of acquisition for executing State equipment cut-in method.

Based on same inventive concept, an embodiment of the present invention provides a kind of computer storage medias, for being stored as The computer program instructions used in access control equipment are stated, it includes the programs for executing above equipment cut-in method.

The computer storage media can be any usable medium or data storage device that computer can access, packet Include but be not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical memory (such as CD, DVD, BD, HVD etc.) and semiconductor memory (such as it is ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid State hard disk (SSD)) etc..

By the above it can be seen that：

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the meter of computer usable program code that the present invention, which can be used at one or more, The computer journey implemented in calculation machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of sequence product.

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The function of being specified in present one flow of flow chart or more than two one box of flow and/or block diagram or more than two boxes Device.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to The manufacture of device is enabled, which realizes in one side of one flow of flow chart or more than two flows and/or block diagram The function of being specified in frame or more than two boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or more than two flows and/or box The step of function of being specified in one box of figure or more than two boxes.

Although the alternative embodiment of the present invention has been described, created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as include can It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of equipment cut-in method, which is characterized in that the method includes：

Obtain the apparatus access state of base station；

If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and by each P The P values obtained after value adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not gather around Plug-like state；

2. according to the method described in claim 1, it is characterized in that, it is described obtain base station apparatus access state, including：

According to the current application access device number in the base station and optimal access device number, the equipment access shape of the base station is determined State.

3. according to the method described in claim 1, it is characterized in that, the execution P values adjust process, and by each P values adjust The P values obtained afterwards are sent to each equipment under the base station, including：

According to the apparatus access state S of the base station before kth time P value adjustment^kWith the Q values in the intensified learning Q value matrixs, really The action Y that the fixed kth time P values adjustment uses^k(l), wherein the k is positive integer；

Obtain the apparatus access state S of the base station after the kth time P values adjust^k+1, and according to the apparatus access state S^kWith The apparatus access state S^k+1, update in the intensified learning Q value matrixs in the apparatus access state S^kLower use it is described Act Y^k(l) corresponding Q values.

4. according to the method described in claim 3, it is characterized in that, the equipment according to the preceding base station of kth time P value adjustment Access state S^kWith the Q values in the intensified learning Q value matrixs, the action Y that the kth time P value adjustment uses is determined^k(l), it wraps It includes：

Determination selects the apparatus access state S from preset each optional action^kThe probability of the corresponding maximum action of Q values； Wherein, the probability and the k positive correlations；

5. according to the method described in claim 3, it is characterized in that, described according to the apparatus access state S^kWith the equipment Access state S^k+1, update in the intensified learning Q value matrixs in the apparatus access state S^kThe action Y of lower use^k(l) Corresponding Q values, including：

According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixs^k+1Corresponding maximum Q values, institute State apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, the apparatus access state S after being updated^kUnder adopt With the action Y^k(l) corresponding Q values.

6. according to the method described in claim 5, it is characterized in that, according to the transfer gain, the intensified learning Q value matrixs Described in apparatus access state S^k+1Corresponding maximum Q values, the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, the apparatus access state S after being updated^kIt is lower to use the action Y^k(l) corresponding Q values, meet following formula：

Wherein, Q (S^k,Y^k(l)) it is in the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding Q values, α are study The factor, and 0<α<1, R_s(S^k,Y^k(l)) it is in the apparatus access state S^kIt is lower to use the action Y^k(l) corresponding transfer increases Benefit, γ are discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional action^k+1Corresponding Q It is worth maximum action,For the apparatus access state S^k+1Corresponding maximum Q values.

7. according to the method described in claim 3, it is characterized in that, described in the update intensified learning Q value matrixs Apparatus access state S^kIt is lower to use the action Y^k(l) after corresponding Q values, further include：

If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S¹Convergence, then after the kth time P value adjustment Any secondary P values adjustment in, the equipment access before selecting any secondary P values adjustment in preset each optional action State corresponds to the maximum action of Q values, wherein the apparatus access state S¹Equipment for the base station before the 1st P values adjustment connects Enter state.

8. the method according to the description of claim 7 is characterized in that the determination intensified learning Q value matrixs are relative to setting Standby access state S¹Convergence, including：

Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S¹Identical and phase The three times P value adjustment nearest to the kth time P value adjustment；

If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q value squares Battle array is relative to the apparatus access state S¹Convergence.

9. according to the method described in claim 8, it is characterized in that, the condition of convergence specifically includes：

Wherein, Yⁿ(l) it is that the recent P values adjustment of kth time P value adjustment times described in distance uses in the adjustment of P values three times Action, Y^n-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Y^n-2(l) it is distance The action that the close P values adjustment of the kth time P value adjustment time thirds uses, ε are preset relatively threshold value, and ε>0.

10. a kind of equipment access device, which is characterized in that described device includes：

Acquisition module, the apparatus access state for obtaining base station；

Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P values and adjusted Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until described The apparatus access state of base station is not congestion state；Wherein, the action that each P values adjustment uses is according to intensified learning Q What the Q values in value matrix determined.

11. a kind of access control equipment, which is characterized in that including：

Memory, for storing program instruction；

Processor, for calling the program instruction stored in the memory, according to acquisition program execute as claim 1 to Method described in any one of 9.

12. a kind of computer storage media, which is characterized in that the computer-readable recording medium storage has computer executable Instruction, the computer executable instructions are for making the computer execute side as claimed in any one of claims 1-9 wherein Method.