CN108347744A - A kind of equipment cut-in method, device and access control equipment - Google Patents
A kind of equipment cut-in method, device and access control equipment Download PDFInfo
- Publication number
- CN108347744A CN108347744A CN201810053320.1A CN201810053320A CN108347744A CN 108347744 A CN108347744 A CN 108347744A CN 201810053320 A CN201810053320 A CN 201810053320A CN 108347744 A CN108347744 A CN 108347744A
- Authority
- CN
- China
- Prior art keywords
- values
- access state
- action
- adjustment
- base station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000009471 action Effects 0.000 claims abstract description 171
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000012546 transfer Methods 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000003860 storage Methods 0.000 claims description 10
- 230000008901 benefit Effects 0.000 claims description 3
- 238000004540 process dynamic Methods 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 61
- 238000004891 communication Methods 0.000 description 23
- 238000010586 diagram Methods 0.000 description 18
- 230000008859 change Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0289—Congestion control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W48/00—Access restriction; Network selection; Access point selection
- H04W48/02—Access restriction performed under specific conditions
- H04W48/06—Access restriction performed under specific conditions based on traffic conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W74/00—Wireless channel access
- H04W74/08—Non-scheduled access, e.g. ALOHA
- H04W74/0833—Random access procedures, e.g. with 4-step access
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/70—Services for machine-to-machine communication [M2M] or machine type communication [MTC]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The embodiment of the invention discloses a kind of equipment cut-in method, device and access control equipment, wherein method includes:Obtain the apparatus access state of base station, if it is determined that apparatus access state is congestion state, access restriction parameter P values adjustment process is then executed, the P values obtained after each P values are adjusted are sent to each equipment under base station, until the apparatus access state of base station becomes not congestion state.So, when under the apparatus access state that base station is in congestion, by executing P values, the Congestion Level SPCC of base station can be effectively improved by adjusting P values with adjusting process dynamics, and due to including the experience of previous adjustment P values in intensified learning Q value matrixs, therefore determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values according to the Q values in intensified learning Q value matrixs during the adjustment of P values, to make base station quickly reach best access state.
Description
Technical field
The present invention relates to a kind of wireless communication technology field more particularly to equipment cut-in method, device and access controls to set
It is standby.
Background technology
Development with from modern communication technology to Internet of Things, MTC (machine type communication, machine type
Communication) equipment largely increases severely, and quantity has been more than H2H (Human-to-Human, person to person's communication) equipment.These MTC are set
Standby communication be smart city, intelligent grid development basis, but also existing cordless communication network is caused prodigious negative
It carries.The access of MTC device can cause existing H2H equipment and to the stringenter M2M (Machine-to- of delay requirement
Machine, inter-machine communication) time delay of equipment increases, and base station control can be caused single especially when access amount is very big, when serious
Member cannot timely and effectively handle mass data, cause base station that cannot normally work for a long time.
In order to solve the random access congestion problems of MTC device, 3GPP (3rd Generation Partnership
Project, third generation partner program) group is woven in TS22.011V11.0.0. (chapter 4.3.4) and proposes ACB
(Access Class Barring, access style limitation) algorithm, ACB define 16 kinds of access styles, some of them access class
Type is that the application of high priority retains.When network load is heavier, base station can using ACB parameters as a part for system information to
Device broadcasts in cell, these parameters include the access probability and back off time of different access styles.The principle of ACB is base station
An access level restriction parameter P (0≤P≤1) is set according to current network load situation.Each equipment is produced before a random access
Random number between a raw 0-1, equipment random number are less than access level restriction parameter P values, carry out random access, if more than with
Machine number then continues generation random number in subsequent time and is accessed again.So when large number of equipment pours in access, it can make net
Network loading condition is eased, and is successfully accessed rate and handling capacity is optimized.However, existing access level restriction algorithm is dividing
In class feedback procedure, processing data are more complicated, when large number of equipment pours in, it cannot be guaranteed that access network is in optimal shape
State, and since the less adjustment for leading to P values of priori of the process of network adjustment access level restriction parameter P does not tend to
It is optimal, convergence rate is slow, and make base station global access state it is poor, cannot be satisfied the delay requirement of each equipment.
In conclusion there is an urgent need for a kind of equipment cut-in methods at present, can not be moved to solve existing radio access technology
State adjusts P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.
Invention content
A kind of equipment cut-in method of present invention offer, device and access control equipment, to solve existing random access
Technology can not dynamically adjust P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.
A kind of equipment cut-in method provided in an embodiment of the present invention, including:
Obtain the apparatus access state of base station;
If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and will be every
The P values obtained after secondary P values adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not
Congestion state;
Wherein, the action that each P values adjustment uses is determined according to the Q values in intensified learning Q value matrixs.
Optionally, the apparatus access state for obtaining base station, including:
According to the current application access device number in base station and optimal access device number, the equipment access shape of the base station is determined
State.
Optionally, the execution P values adjust process, and the P values obtained after each P values are adjusted are sent under the base station
Each equipment, including:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs
Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) the P values are adjusted, and the P values after adjustment are sent to each equipment;
Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and shape is accessed according to the equipment
State SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkLower use
The action Yk(l) corresponding Q values.
Optionally, the apparatus access state S according to the preceding base station of kth time P value adjustmentkWith the intensified learning Q
Q values in value matrix determine the action Y that the kth time P value adjustment usesk(l), including:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general
Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, described according to the apparatus access state SkWith the apparatus access state Sk+1, update the extensive chemical
It practises in Q value matrixs in the apparatus access state SkThe action Y of lower usek(l) corresponding Q values, including:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q
Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk
It is lower to use the action Yk(l) corresponding Q values.
Optionally, according to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1It is corresponding
Maximum Q values, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the equipment connects after being updated
Enter state SkIt is lower to use the action Yk(l) corresponding Q values, meet following formula:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are
Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn
Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right
The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, in the apparatus access state S in the update intensified learning Q value matrixskIt is lower to be moved using described
Make Yk(l) after corresponding Q values, further include:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune
In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action
Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment
Apparatus access state.
Optionally it is determined that in preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment accesses shape
State S1The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q
Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times
The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l)
For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Based on same inventive concept, the present invention also provides a kind of equipment access devices, including:
Acquisition module, the apparatus access state for obtaining base station;
Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P value tune
It is had suffered journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until
The apparatus access state of the base station is not congestion state;Wherein, the action that each P values adjustment uses is according to extensive chemical
Practise what the Q values in Q value matrixs determined.
Optionally, the acquisition module is specifically used for:
It is connect according to the current application access device number in the base station and optimal access device number, the equipment for determining the base station
Enter state.
Optionally, the processing module is specifically used for:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs
Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module described each
A equipment;
The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition modulek+1, and root
According to the apparatus access state SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the equipment
Access state SkThe action Y of lower usek(l) corresponding Q values.
Optionally, the processing module is specifically additionally operable to:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general
Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, the processing module is specifically additionally operable to:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q
Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk
It is lower to use the action Yk(l) corresponding Q values.
Optionally, the processing module is specifically additionally operable to the apparatus access state S after being updated by following formulak
It is lower to use the action Yk(l) corresponding Q values:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are
Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn
Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right
The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, the processing module is additionally operable to:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune
In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action
Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment
Apparatus access state.
Optionally, the processing module is specifically additionally operable to:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Phase
The nearest values of the P three times adjustment of the same and relatively described kth time P values adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q
Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times
The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l)
For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Another embodiment of the present invention provides a kind of access control equipment comprising memory and processor, wherein described
Memory is for storing program instruction, and the processor is for calling the program instruction stored in the memory, according to acquisition
Program execute any of the above-described kind of method.
Another embodiment of the present invention provides a kind of computer storage media, and the computer-readable recording medium storage has
Computer executable instructions, the computer executable instructions are for making the computer execute any of the above-described kind of method.
Equipment cut-in method provided in an embodiment of the present invention includes the apparatus access state for obtaining base station, however, it is determined that described
Apparatus access state is congestion state, then executes access restriction parameter P values adjustment process, the P values obtained after each P values are adjusted
The each equipment being sent under the base station, until the apparatus access state of the base station becomes not congestion state.In this way, in base
When standing under the access state in congestion, by executing P values, gathering around for base station can be effectively improved by adjusting P values with adjusting process dynamics
Plug degree, and due to including the experience of previous adjustment P values, basis during the adjustment of P values in intensified learning Q value matrixs
Q values in intensified learning Q value matrixs determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values, to make
Base station quickly reaches best equipment access state.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is the system architecture diagram that equipment cut-in method provided in an embodiment of the present invention is applicable in;
Fig. 2 is the flow diagram corresponding to a kind of equipment cut-in method provided in an embodiment of the present invention;
Fig. 3 is the flow diagram corresponding to the method for the access state of the determination base station provided in the embodiment of the present invention;
Fig. 4 is the resource parameters schematic diagram of base station provided in an embodiment of the present invention;
Fig. 5 is contention access number of devices and the relational graph being successfully accessed between probability in the embodiment of the present invention;
Fig. 6 is contention access number of devices and the relational graph being successfully accessed between number of devices in the embodiment of the present invention;
Fig. 7 is the schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of equipment access device provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of access control equipment provided in an embodiment of the present invention.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
One step it is described in detail, it is clear that described embodiment, only a part of the embodiment of the present invention, rather than whole implementation
Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts
All other embodiment, shall fall within the protection scope of the present invention.
The embodiment of the present invention is described in further detail with reference to the accompanying drawings of the specification.
The equipment cut-in method provided in the embodiment of the present invention is applied in the random access field of equipment for machine type communication.Figure
1 system architecture diagram being applicable in for equipment cut-in method provided in an embodiment of the present invention, connects as shown in Figure 1, system includes equipment
Enter device 101, base station 102 and a plurality of types of equipment for machine type communication (as illustrated in the drawing 103 to 110).
Wherein, the base station can be 2G, 3G, 4G, LTE-M (Long Term Evolution Machine to
Machine, the technology of Internet of things based on long term evolution), NB-IOT (Narrow Band Internet of Things, narrowband
Internet of Things) etc. multiple types communication system or the base station in Internet of things system, the present invention this is not particularly limited.
The equipment access device can be independently of the smart machine of base station setting, the wireless communication of the device and base station
Unit (such as RRU (Radio Remote Unit, Remote Radio Unit)) foundation has communication connection, is obtained by the communication connection
The resource and access situation of base station.Alternatively, the equipment access device is also embedded in the RRU of inside of base station, directly acquire
The resource and access situation, the present invention of base station are not particularly limited this.
The equipment for machine type communication is such as smart mobile phone, tablet computer communication equipment, can also be as intellectual water meter,
A plurality of types of internet of things equipment, the invention such as ammeter, parking management module are not particularly limited this, these machine type communications
It is provided with wireless communication module in equipment, passes through wireless network and base station communication.
Fig. 2 shows the flow diagrams corresponding to a kind of equipment cut-in method provided in the embodiment of the present invention, this sets
Standby cut-in method is specifically executed by the equipment access device in above system framework, as shown in Fig. 2, the method includes:
Step S201:Obtain the apparatus access state of base station;
Step S202:If it is determined that the apparatus access state is congestion state, then executes access restriction parameter P values and adjusted
Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station, until the access shape of the base station
State becomes not congestion state;Wherein, the action that each P values adjustment uses is true according to the Q values in intensified learning Q value matrixs
Fixed.
In this way, when under the apparatus access state that base station is in congestion, P is adjusted with adjusting process dynamics by executing P values
Value can effectively improve the Congestion Level SPCC of base station, and due to including the warp of previous adjustment P values in intensified learning Q value matrixs
It tests, therefore determines that the action used in adjustment P values every time can be effective according to the Q values in intensified learning Q value matrixs during the adjustment of P values
The convergence rate for improving P values, to make base station quickly reach best access state, overall efficiency higher.
In step s 201, equipment access device can obtain base station by the communication connection of the wireless communication unit with base station
Current access situation and access capability, so that it is determined that going out apparatus access state.
Specifically, the apparatus access state of base station is true according to current application access device number and optimal access device number
Fixed, wherein application access device number is practical contention access number of devices in current system, optimal access device number is to be currently
Optimal contention access number of devices in system, as shown in figure 3, determining that apparatus access state may include following step S301 to step
S303:
Step S301:Obtain current application access device number and optimal access device number;
Optimal access device number, i.e. optimal contention access equipment base station are practical to access the number of devices competed when maximum device number
Amount, the access capability current for characterizing base station.The optimal access device number is equipment access device according to the current money in base station
What source situation determined.Fig. 4 is the schematic diagram of the resource parameters of base station provided in an embodiment of the present invention, as shown in figure 4, the money of base station
Source situation includes following any one or more resource parameters:
Can with leading number of codes, the maximum retransmission of lead code, the access request arrival rate of each equipment, connect at random
Input time slot assignment period, backoff parameter, random access response length of window.
In the embodiment of the present invention, equipment access device can determine current practical competition by the communication connection with base station
The current resource situation of access device number and base station.Then, by carrying out emulation mould to the current resource situation in base station
It is quasi-, obtain current optimal contention access number of devices.
It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 5 and is successfully accessed between probability
Relational graph is successfully accessed probability and is maintained at 1 as shown in figure 5, when applying for that access device number is less than maximum contention access device number
It is constant, with the increase of practical contention access number of devices, when practical contention access number of devices is more than or equal to maximum contention number of devices
When, it is successfully accessed probability and reduces rapidly.It is equal to the turning point of maximum contention access device number in practical contention access number of devices
When on position, the change rate for being successfully accessed probability is maximum, and with the increase of practical contention access number of devices, change rate gradually drops
It is low, and finally level off to 0.
It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 6 and is successfully accessed between number of devices
Relational graph, as shown in fig. 6, practical contention access number of devices be less than maximum contention access device number when, be successfully accessed equipment
Number linearly increases with the increase of practical contention access number of devices, with the increase of practical contention access number of devices, works as reality
After contention access number of devices is more than or equal to maximum contention access device number, number of devices is successfully accessed with practical contention access equipment
Several increase and reduce, and be successfully accessed number of devices and be in maximum contention access device number in practical contention access number of devices
When near turning point, fall is maximum.
In conjunction with Fig. 5 and Fig. 6, the optimal contention access number of devices in the embodiment of the present invention is more than maximum contention access device
Number is the practical contention access number of devices that base station is successfully accessed in the case where setting is successfully accessed rate, and it is smaller which is successfully accessed rate
In 1.That is, optimal contention access number of devices and maximum contention access device number are numerically very close to being in Fig. 5 and figure
Contention access number of devices is attached close to turning point more than in the sloping portion after maximum contention access device number in curve shown in 6
Close a certain concrete numerical value.
In the embodiment of the present invention, setting is successfully accessed rate and can be specifically arranged by those skilled in the art, the present invention to this not
Do concrete restriction.It is alternatively possible to which setting success rate is taken 98.9% or other close numerical value.
Step S302:According to practical contention access number of devices and optimal contention access number of devices, Congestion Level SPCC is calculated
Value.
In the embodiment of the present invention, congestion degree value can be calculated by following formula one:
Wherein, P is congestion degree value, and N is practical contention access number of devices, N0For optimal contention access number of devices.
According to formula one as can be seen that congestion degree value can regard that practical contention access number of devices deviates optimal competition as
The weighted value of access device number, when practical contention access number of devices is less than optimal contention access number of devices, base station is not gathered around
Plug, it is 1 to be successfully accessed probability, and the equipment of all application accesses can be successfully accessed base station, thus, it is calculated by above-mentioned formula one
Obtained congestion degree value is negative value.And when practical contention access number of devices is more than optimal contention access number of devices, base station exists
A degree of congestion is successfully accessed probability and is less than or equal to 1, since the number of devices of application access has been more than the access energy of base station
Power, thus, can have a certain number of equipment can not be successfully accessed, and the congestion degree value being calculated by above-mentioned formula one is
Positive value.
Step S303:The congestion degree value section belonging to the congestion degree value is determined, according to the congestion degree value area
Between and the correspondence between each congestion degree value section and apparatus access state, determine the base station equipment access
State.
All possible apparatus access state is preset in the embodiment of the present invention, in base station, each equipment accesses shape
State corresponds to a congestion degree value section.Each congestion degree value in the same congestion degree value section due to mutually it
Between gap it is smaller, it is believed that be belong to the same congestion level, and an apparatus access state of base station correspond to a congestion
Rank, thus, it can judge the current congestion level in base station according to the congestion degree value being calculated by above-mentioned formula one, from
And determine the apparatus access state of base station.
If for example, all possible congestion level in base station be [0, L], l ∈ [0, L], wherein L indicate base station highest gather around
Rank is filled in, l is some congestion level therein, and l and L are the positive integer more than or equal to 1.Each congestion level is corresponding
One apparatus access state of base station indicates base station not congestion, i.e., corresponding apparatus access state when congestion level is 0
Not congestion state is also belonged to, when congestion level is more than 0, indicates that congestion, corresponding apparatus access state category has occurred in base station
In congestion state.
By above-mentioned formula one it can also be seen that the congestion degree value being calculated is continuous numerical value, and each congestion grade
It is other, it is positive integer, therefore, preset each congestion degree value in base station and apparatus access state (i.e. congestion grade can be passed through
Correspondence between not) quantifies the congestion degree value being calculated, so that it is determined that going out the current equipment access in base station
State.
In this way, base station has altogether L+1 kind congestion levels, the i.e. possible apparatus access state of L+1 kinds, the embodiment of the present invention
In, those skilled in the art can according to actual needs set the quantity of apparatus access state (i.e. the quantity of congestion level)
It sets, the present invention is not particularly limited this.
In addition, in the embodiment of the present invention, when congestion level is 0, corresponding congestion degree value section is [- ∞, 0], and its
The range in the corresponding congestion degree value section of his congestion level can be specifically arranged by those skilled in the art, and the present invention does not do this
Concrete restriction.For example, the corresponding congestion degree value section of the congestion level of each non-zero can be set to the area of size equalization
Between, or can also to set the corresponding congestion degree value section of the congestion level of each non-zero to size unequal, but with
The increase section size of congestion level also gradually smaller section, i.e. congestion level is higher, corresponding congestion degree value section
Range is with regard to smaller.
In the embodiment of the present invention, P values when equipment access device is only in base station the apparatus access state of congestion carry out
Adjustment, therefore, in step s 102, however, it is determined that apparatus access state is congestion state, then triggers and execute access restriction parameter P values
Adjustment process.And if it is determined that apparatus access state is not congestion state, then current P values are not adjusted.Wherein, above-mentioned congestion shape
State refers to the access state that corresponding congestion level belongs to [1, L], without congestion state refer to corresponding congestion level is 0 to set
Standby access state.
Since different moments apply for that the number of devices of access base station may be different, the access state of base station may be real-time change
Change, therefore, the equipment access device in the embodiment of the present invention can periodically go to obtain the access state of base station, thus
When determining that the access state of base station is congestion, triggering P values adjust process.
In the embodiment of the present invention, equipment access device can determine that current apparatus access state is congestion state
Afterwards, the P values for just going to acquisition base station current can also be equipment access device by the communication connection with base station, determine that base station is worked as
It is got while preceding apparatus access state, the present invention is not particularly limited this.
Specifically, above-mentioned P values adjustment process includes multiple P values adjustment, each P values adjustment can regard whole P values adjustment as
One cycle in the process.The P values obtained after adjustment are sent to each equipment of access base station, directly in each P values adjustment
Apparatus access state to base station becomes not congestion state.
The flow diagram of the exemplary P values adjustment processes given in the embodiment of the present invention of Fig. 7, as shown in fig. 7, P values
Adjustment process specifically comprises the following steps S701 to step S704:
Step S701:According to the apparatus access state S of the base station before kth time P value adjustmentkWith the intensified learning Q values
Q values in matrix determine the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Step S702:Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to described each set
It is standby;
Step S703:Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and set according to described
Standby access state SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state
SkThe action Y of lower usek(l) corresponding Q values;
Step S704:If it is determined that apparatus access state Sk+1For not congestion state, then terminates P values adjustment process, otherwise carry out
+ 1 P values adjustment of kth.
The embodiment of the present invention uses the thought of Q study to adjust P values, and P values adjust process and include repeatedly cycle for one
Adjustment process.It is adjusted for a P value in cycle each time, the multiple P values in the secondary P values adjustment before basis adjust middle school
The experience practised select this P value adjust change P values use action.
Intensified learning Q value matrixs and a behavior aggregate for being used for adjusting P values there are one being pre-set in equipment access device
It closes.Wherein, intensified learning Q value matrixs are used for recording the experience learnt from the cycle of previous adjustment P values, intensified learning Q
One apparatus access state of each line identifier base station in value matrix, each row mark base station adjust in the set of actions of P values
One action, in intensified learning Q value matrixs the i-th row jth row in Q values indicate in state SiIn lower selection set of actions
J-th of action YjAdjust the corresponding Q values of P values.The higher expression of Q values is in state SiSelection acts YjReach final goal (i.e. not congestion
State) success rate is higher, i.e. and selection acts YjThe benefit higher of whole system afterwards.Intensified learning Q value matrixs are in above-mentioned P values tune
It has suffered and is initialised 0 when journey starts to execute (i.e. first time P values adjustment before), often pass through a P values adjustment later, it all can be correspondingly
Adjust Q values therein.
The set of actions for being used for adjusting P values in base station can be { Y (- H) ... Y (- 1), Y (0), Y (1) ..., Y (H) }
Form, wherein each action corresponds to a P value adjustment amount, which can be positive number, negative, or 0.It can
See, selection increase P values action be adjusted after can allow more equipment access, conversely, selection reduce P values action adjusted
The quantity of the equipment of access can be reduced after whole.
In the embodiment of the present invention, those skilled in the art can be to the P value adjustment amounts corresponding to each action in set of actions
Specifically it is arranged, the present invention is not particularly limited this.For example, the unit that P values can be adjusted be defined as 0.01, it is specified that
It acts Y (1) to indicate P values increasing 0.01, action Y (- 1) indicates P values reducing 0.01, and so on.
It is clear, easy in order to describe, (i.e. kth time P values adjustment) is only adjusted with any secondary P values therein to introduce below
Entire P values adjust process.
In the adjustment of P values each time, equipment access device can all adjust a P value, and the P values after adjustment are handed down to base
Each equipment under standing can determine itself whether apply accessing after equipment receives the P values after adjustment, therefore, often through once following
The apparatus access state of ring, base station also can correspondingly change.
In step s 701, in kth time P value adjustment, equipment access device adjusts setting for preceding base station according to kth time P values
Standby access state SkWith the Q values in intensified learning Q value matrixs, the action Y that kth time P value adjustment uses is determinedk(l).Wherein, k=1
When, apparatus access state S1The apparatus access state obtained in as step S201 triggers the equipment access of P values adjustment process
State and first time P value adjust the apparatus access state of preceding base station.
It is specific as follows:If intensified learning Q value matrixs do not have relative device access state S1Convergence determines then according to the k
Go out and selects apparatus access state S in preset each optional actionkThe probability of the corresponding maximum action of Q values.Then, according to
The probability determined and preset each optional action determine action Yk(l)。
Wherein, k refers to which time P value adjustment current kth time P values adjustment is during the adjustment of entire P values.
In the embodiment of the present invention, the optional action all same under all possible apparatus access state of base station.Therefore, on
State apparatus access state SkIt is each it is optional action be base station adjustment P values set of actions in each action.Certainly, ability
Field technique personnel also can be specifically arranged corresponding optional action under the possible apparatus access state in base station, each access
The corresponding optional action of state can be different due to the difference of its congestion level, and the present invention is not particularly limited this.
In the embodiment of the present invention, since intensified learning Q value matrixs are initialized to 0 when P values adjustment process starts, and it is rigid
Intensified learning Q value matrixs after initialization can not be how the 1st P values adjustment selects action to be instructed, thus, the 1st
An action in secondary P values adjustment in equipment access device random selection set of actions, which is used as, to be started, and is triggered the P values and was adjusted
Journey.
Then, in order to obtain the experience of more adjustment P values, equipment access device will be as much as possible to each in set of actions
A action carries out trial and error, thus, in P values adjustment later, equipment access device can still be randomly selected according to certain probability
Action is chosen at the apparatus access state S of P values adjustment according to certain probabilitykThe lower maximum action of Q values.It is gradual with k
Increase, the P values adjustment experience for including in intensified learning Q value matrixs is more and more abundant more, thus it is maximum gradually to increase selection Q values
That action probability, the corresponding probability for reducing random selection action.
As it can be seen that acting Y in selection in the embodiment of the present inventionk(l) when, by apparatus access state SkEach optional action in Q
It is worth maximum action as action Yk(l) the k value positive correlations of probability and kth time P value adjustment, equipment access device can be in determination
After the probability for going out to select the maximum action of Q values, in conjunction with apparatus access state SkEach optional action, finally determine action
Yk(l)。
If intensified learning Q value matrixs relative device access state S1Convergence, then it represents that in multiple P values adjustment before
In learnt to by apparatus access state S1It is adjusted to the optimal policy of not congestion state, then, in any secondary P later
It will all be selected in preset each optional action in value adjustment, it is maximum to correspond to Q values for apparatus access state before any secondary P values adjustment
That action, as the action used in any secondary P values adjustment.
In step S702, equipment access device is based on the action Y selected in kth time P value adjustmentk(l) P values are adjusted,
The P values obtained after adjustment are equal to the current P values in base station and add selected action Yk(l) corresponding P values adjustment amount, then will adjustment
P values afterwards are sent to each equipment under base station.
After equipment receives the P values after the adjustment of base station, method in the prior art can be used and judge whether application access base
It stands.I.e. equipment itself randomly generates a P value, and the P values that itself is generated are compared with the P values that base station issues, if itself
The P values of generation are less than or equal to the P values that base station issues, then send application access request to base station, otherwise do not send application access and ask
It asks, after waiting for a period of time, judges whether to apply for access base station again.
In step S703, equipment access device can be accessed according to the acquisition base station equipment described in above-mentioned steps S201
The method of state obtains the apparatus access state S of base stationk+1, apparatus access state Sk+1Refer to being adopted in kth time P value adjustment
With action Yk(l) after adjusting P values, the apparatus access state of base station is changed, the new apparatus access state that base station reaches.
If apparatus access state Sk+1It is still congestion state, indicates that P values adjustment process continues, in this case, the equipment
Access state Sk+1And+1 P value of kth adjusts the apparatus access state of preceding base station.
Get the apparatus access state S of base stationk+1Afterwards, equipment access device can also be according to apparatus access state SkWith set
Standby access state Sk+1, update in intensified learning Q value matrixs in apparatus access state SkIt is lower to use action Yk(l) corresponding Q values, tool
Body includes:
It determines first in apparatus access state SkLower selection acts Yk(l) corresponding transfer gain.In the embodiment of the present invention,
Equipment access device can determine the transfer gain according to preset transfer gain function, or can also be according to institute in Fig. 8
The transfer gain matrix shown determines that the transfer gain, the present invention are not particularly limited this.
By taking the transfer gain matrix in Fig. 8 as an example, each line identifier base station of the transfer gain is before kth time P value adjustment
Apparatus access state Sk, each row mark is using action Yk(l) the apparatus access state S that base station reaches after adjustment P valuesk+1, and
Numerical value in i-th row jth row indicates the apparatus access state of base station by SiIt is transferred to state SjCorresponding transfer gain.
As seen from Figure 8, it is corresponding when the apparatus access state of base station is transferred to high congestion rank by low congestion level
Transfer gain is negative value, and the more high corresponding transfer gain of the congestion level for the apparatus access state being transferred to is smaller.Base station
Apparatus access state when being transferred to low congestion level by high congestion rank corresponding transfer gain be positive value, and the equipment being transferred to
The more low corresponding transfer gain of the congestion level of access state is bigger.If the apparatus access state of base station is constant after the adjustment of P values,
Corresponding transfer gain is zero.
Then, Y is acted according to selectionk(l) transfer gain caused by, apparatus access state S in intensified learning Q value matrixsk +1Corresponding maximum Q values, apparatus access state SkUsing the action Yk(l) corresponding Q values Q values are calculated by following formula
Apparatus access state S after being updatedkThe corresponding action Yk(l) Q values:
Wherein, SkThe apparatus access state of preceding base station, Y are adjusted for kth time P valuesk(l) it is to be selected in kth time P value adjustment
First action;Q(Sk,Yk(l)) it is in apparatus access state SkIt is lower to use action Yk(l) corresponding Q values, and the Q on the right of equal sign
(Sk,Yk(l)) apparatus access state S before representing matrix updatekRespective action Yk(l) Q values, the Q (S on the equal sign left sidek,Yk
(l)) apparatus access state S after representing matrix updatekRespective action Yk(l) Q values;α is Studying factors, it is value range
Real number between (0,1), α is bigger, and the effect for indicating that reservation is trained before is fewer, more payes attention in current kth time P value adjustment
Middle selection acts Yk(l) caused return afterwards, i.e. selection act Yk(l) what is reached after transfer gain caused by and update sets
Standby access state Sk+1Q value of the Q values after matrix update in the ratio that accounts for it is higher, conversely, the smaller expressions of α are more paid attention to before
The adjustment of P values in the experience that learns, that is, the ratio accounted in the Q values of Q values in the updated before updating is higher;Rs(Sk,Yk(l))
For in apparatus access state SkIt is lower to use action Yk(l) corresponding transfer gain, γ are discount factor, and 0<γ<1, γ is bigger,
More pay attention to experience, Sk+1For the new apparatus access state that base station reaches after kth time P value adjustment, Y (l) is in intensified learning Q
Apparatus access state S in value matrixk+1Each optional action in corresponding maximum Q values action,For
Apparatus access state Sk+1Corresponding maximum Q values, i.e., in apparatus access state Sk+1The Q values of lower respective action Y (l).
In the embodiment of the present invention, those skilled in the art can be according to above-mentioned study in formula of the actual demand to calculating Q values
The factor and the concrete numerical value of discount factor are reasonably arranged, and the present invention is not particularly limited this.
Due to apparatus access state S it can be seen from Fig. 8 combinations above-mentioned formula twokFor congestion state, if the action of selection
Yk(l) it is the action for increasing P values, then base station will allow more equipment application access base stations, in this case, base station
Apparatus access state deviates the case where not congestion state can be further serious, that is, adjusts the new apparatus access state S after P valuesk+1's
Congestion level can be got higher, and the transfer gain brought at this time is negative value, is equivalent to punishment;Correspondingly, if the action Y of selectionk(l) it is
Reduce the action of P values, then base station will reduce the number of devices for allowing to apply for access base station, in this case, base station is set
The case where standby access state deviates not congestion state can mitigate, that is, adjust the new apparatus access state S after P valuesk+1Congestion grade
It is not lower, the transfer gain brought at this time is positive value, is equivalent to reward.
In step S704, if using selected action Y through kth time P value adjustmentk(l) after adjusting P values, obtained base
Stand the new access state S reachedk+1Be still congestion state, then continue kth+1 P value adjustment, otherwise, indicate incited somebody to action
Apparatus access state S when the apparatus access state of base station is by P values adjustment process1It is adjusted to not congestion state, at this time
Cycle is jumped out, terminates P values and adjusts process.
It should be noted that after updating intensified learning Q value matrixs in step S703, but start executing step S704
Further include judging whether updated intensified learning Q value matrixs are convergent relative to the first access state recycle next time before
Step.
As it was noted above, if intensified learning Q value matrixs are relative to apparatus access state S1Convergence is indicated equipment
Access state S1The optimal policy for being adjusted to not congestion state has been found, and each Q values in intensified learning Q value matrixs have become
It is close to stablize, therefore, select any secondary P values to adjust setting for preceding base station in any secondary P values adjustment after kth time P value adjustment
Action of the maximum action of Q values as adjustment P values in each optional action of standby access state.
Specifically, whether intensified learning Q values connect relative to equipment after equipment access device judges update in the following way
Enter state S1Convergence:
First, before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state
S1The nearest values of the P three times adjustment of the identical and relatively described kth time P values adjustment time;
Then, judge whether the action that this is used during P values adjust three times meets the following preset condition of convergence, if met
The condition of convergence, then when illustrating to be transferred to new apparatus access state every time, amplitude of variation all very littles of selected action (are followed for each time
The P value adjustment amounts of selected action are all closer in ring), at this time, it is believed that each Q values in intensified learning Q value matrixs are
It is basically unchanged, updated intensified learning Q value matrixs are relative to apparatus access state S1Convergence.
The preset condition of convergence meets following formula:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times
The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l)
For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
As can be seen that due to judging intensified learning Q value matrixs whether relative to apparatus access state S1Convergence needs at least
Apparatus access state is apparatus access state S before 3 P values adjustment of experience1Cycle, thus, P values adjustment process 3 times
In cycle, intensified learning Q value matrixs must be relative to apparatus access state S1It does not restrain.
Moreover, because determining intensified learning Q value matrixs relative to apparatus access state S1Each P values adjustment before convergence
In, it can all judge whether to meet the condition of convergence after selection action or update reinforcing Q value matrixs, therefore, if in kth time P value adjustment
Middle first time determines that intensified learning Q value matrixs meet convergence addressee, then the apparatus access state before the kth time P value adjustment
One is set to apparatus access state S1.In i.e. preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment connects
Enter state S1The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment may include kth time P value adjustment.
Certainly it is also possible to have passed through P values adjustment many times, matrix is not restrained still in intensified learning Q, equipment access device
It is also provided with a maximum receive number K, i.e., if being still unsatisfactory for the condition of convergence after kth time P value adjustment, but k values are
More than or equal to maximum receive number K, then it is assumed that intensified learning Q value matrixs are relative to apparatus access state S1Convergence terminates
Matrix training process, the maximum action of Q values in directly selecting intensified learning Q value matrixs in the values of P each time adjustment later
P values are adjusted, are no longer acted with certain probability random selection.
Based on same inventive concept, the embodiment of the present invention also provides a kind of equipment access device, and Fig. 9 is that the present invention is implemented
The structural schematic diagram of a kind of equipment access device provided in example, as shown in figure 9, the equipment access device 900 includes:
Acquisition module 901, the apparatus access state for obtaining base station;
Processing module 902, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P
It is worth adjustment process, and obtained P values each are set by what transceiver module 903 was sent under the base station after each P values are adjusted
It is standby, until the apparatus access state of the base station is not congestion state;Wherein, the action that each P values adjustment uses is root
It is determined according to the Q values in intensified learning Q value matrixs.
Optionally, the acquisition module 901 is specifically used for:
It is connect according to the current application access device number in the base station and optimal access device number, the equipment for determining the base station
Enter state.
Optionally, the processing module 902 is specifically used for:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs
Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module 903 described
Each equipment;
The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition module 901k+1, and
According to the apparatus access state SkWith the apparatus access state Sk+1, update and set described in the intensified learning Q value matrixs
Standby access state SkThe action Y of lower usek(l) corresponding Q values.
Optionally, the processing module 902 is specifically additionally operable to:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general
Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, the processing module 902 is specifically additionally operable to:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q
Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk
It is lower to use the action Yk(l) corresponding Q values.
Optionally, the processing module 902 is specific is additionally operable to the equipment access shape after being updated by following formula
State SkIt is lower to use the action Yk(l) corresponding Q values:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are
Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn
Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right
The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, the processing module 902 is additionally operable to:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune
In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action
Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment
Apparatus access state.
Optionally, the processing module 902 is specifically additionally operable to:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Phase
The nearest values of the P three times adjustment of the same and relatively described kth time P values adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q
Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times
The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l)
For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Based on same inventive concept, the embodiment of the present invention also provides another access control equipment, and access control is set
It is standby to be specifically as follows desktop computer, portable computer, smart mobile phone, tablet computer, personal digital assistant (Personal
Digital Assistant, PDA) etc..As shown in Figure 10, which may include central processing unit
(Center Processing Unit, CPU) 1001, memory 1002, input-output apparatus 1003 and bus system 1004
Deng.Wherein, input equipment may include keyboard, mouse, touch screen etc., and output equipment may include display equipment, such as liquid crystal
Show device (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..
Memory may include read-only memory (ROM) and random access memory (RAM), and provide storage to processor
The program instruction and data stored in device.In embodiments of the present invention, memory can be used for storing above equipment cut-in method
Program.
Processor is by the program instruction for calling memory to store, and processor according to the program instruction of acquisition for executing
State equipment cut-in method.
Based on same inventive concept, an embodiment of the present invention provides a kind of computer storage medias, for being stored as
The computer program instructions used in access control equipment are stated, it includes the programs for executing above equipment cut-in method.
The computer storage media can be any usable medium or data storage device that computer can access, packet
Include but be not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical memory (such as CD, DVD,
BD, HVD etc.) and semiconductor memory (such as it is ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid
State hard disk (SSD)) etc..
By the above it can be seen that:
Equipment cut-in method provided in an embodiment of the present invention includes the apparatus access state for obtaining base station, however, it is determined that described
Apparatus access state is congestion state, then executes access restriction parameter P values adjustment process, the P values obtained after each P values are adjusted
The each equipment being sent under the base station, until the apparatus access state of the base station becomes not congestion state.In this way, in base
When standing under the access state in congestion, by executing P values, gathering around for base station can be effectively improved by adjusting P values with adjusting process dynamics
Plug degree, and due to including the experience of previous adjustment P values, basis during the adjustment of P values in intensified learning Q value matrixs
Q values in intensified learning Q value matrixs determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values, to make
Base station quickly reaches best equipment access state.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the meter of computer usable program code that the present invention, which can be used at one or more,
The computer journey implemented in calculation machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of sequence product.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real
The function of being specified in present one flow of flow chart or more than two one box of flow and/or block diagram or more than two boxes
Device.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to
The manufacture of device is enabled, which realizes in one side of one flow of flow chart or more than two flows and/or block diagram
The function of being specified in frame or more than two boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or
The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or more than two flows and/or box
The step of function of being specified in one box of figure or more than two boxes.
Although the alternative embodiment of the present invention has been described, created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as include can
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (12)
1. a kind of equipment cut-in method, which is characterized in that the method includes:
Obtain the apparatus access state of base station;
If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and by each P
The P values obtained after value adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not gather around
Plug-like state;
Wherein, the action that each P values adjustment uses is determined according to the Q values in intensified learning Q value matrixs.
2. according to the method described in claim 1, it is characterized in that, it is described obtain base station apparatus access state, including:
According to the current application access device number in the base station and optimal access device number, the equipment access shape of the base station is determined
State.
3. according to the method described in claim 1, it is characterized in that, the execution P values adjust process, and by each P values adjust
The P values obtained afterwards are sent to each equipment under the base station, including:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q values in the intensified learning Q value matrixs, really
The action Y that the fixed kth time P values adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) the P values are adjusted, and the P values after adjustment are sent to each equipment;
Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and according to the apparatus access state SkWith
The apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkLower use it is described
Act Yk(l) corresponding Q values.
4. according to the method described in claim 3, it is characterized in that, the equipment according to the preceding base station of kth time P value adjustment
Access state SkWith the Q values in the intensified learning Q value matrixs, the action Y that the kth time P value adjustment uses is determinedk(l), it wraps
It includes:
Determination selects the apparatus access state S from preset each optional actionkThe probability of the corresponding maximum action of Q values;
Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
5. according to the method described in claim 3, it is characterized in that, described according to the apparatus access state SkWith the equipment
Access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkThe action Y of lower usek(l)
Corresponding Q values, including:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q values, institute
State apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedkUnder adopt
With the action Yk(l) corresponding Q values.
6. according to the method described in claim 5, it is characterized in that, according to the transfer gain, the intensified learning Q value matrixs
Described in apparatus access state Sk+1Corresponding maximum Q values, the apparatus access state SkIt is lower to use the action Yk(l) corresponding
Q values, the apparatus access state S after being updatedkIt is lower to use the action Yk(l) corresponding Q values, meet following formula:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are study
The factor, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer increases
Benefit, γ are discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1Corresponding Q
It is worth maximum action,For the apparatus access state Sk+1Corresponding maximum Q values.
7. according to the method described in claim 3, it is characterized in that, described in the update intensified learning Q value matrixs
Apparatus access state SkIt is lower to use the action Yk(l) after corresponding Q values, further include:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then after the kth time P value adjustment
Any secondary P values adjustment in, the equipment access before selecting any secondary P values adjustment in preset each optional action
State corresponds to the maximum action of Q values, wherein the apparatus access state S1Equipment for the base station before the 1st P values adjustment connects
Enter state.
8. the method according to the description of claim 7 is characterized in that the determination intensified learning Q value matrixs are relative to setting
Standby access state S1Convergence, including:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Identical and phase
The three times P value adjustment nearest to the kth time P value adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q value squares
Battle array is relative to the apparatus access state S1Convergence.
9. according to the method described in claim 8, it is characterized in that, the condition of convergence specifically includes:
Wherein, Yn(l) it is that the recent P values adjustment of kth time P value adjustment times described in distance uses in the adjustment of P values three times
Action, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) it is distance
The action that the close P values adjustment of the kth time P value adjustment time thirds uses, ε are preset relatively threshold value, and ε>0.
10. a kind of equipment access device, which is characterized in that described device includes:
Acquisition module, the apparatus access state for obtaining base station;
Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P values and adjusted
Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until described
The apparatus access state of base station is not congestion state;Wherein, the action that each P values adjustment uses is according to intensified learning Q
What the Q values in value matrix determined.
11. a kind of access control equipment, which is characterized in that including:
Memory, for storing program instruction;
Processor, for calling the program instruction stored in the memory, according to acquisition program execute as claim 1 to
Method described in any one of 9.
12. a kind of computer storage media, which is characterized in that the computer-readable recording medium storage has computer executable
Instruction, the computer executable instructions are for making the computer execute side as claimed in any one of claims 1-9 wherein
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810053320.1A CN108347744B (en) | 2018-01-19 | 2018-01-19 | Equipment access method, device and access control equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810053320.1A CN108347744B (en) | 2018-01-19 | 2018-01-19 | Equipment access method, device and access control equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108347744A true CN108347744A (en) | 2018-07-31 |
CN108347744B CN108347744B (en) | 2020-08-28 |
Family
ID=62961086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810053320.1A Active CN108347744B (en) | 2018-01-19 | 2018-01-19 | Equipment access method, device and access control equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108347744B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108966330A (en) * | 2018-09-21 | 2018-12-07 | 西北大学 | A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning |
CN113810883A (en) * | 2021-07-01 | 2021-12-17 | 中铁二院工程集团有限责任公司 | Internet of things large-scale random access control method |
CN114503622A (en) * | 2019-09-27 | 2022-05-13 | 瑞典爱立信有限公司 | Method and apparatus for access or RAT restriction |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101466111A (en) * | 2009-01-13 | 2009-06-24 | 中国人民解放军理工大学通信工程学院 | Dynamic spectrum access method based on policy planning constrain Q study |
CN102256262A (en) * | 2011-07-14 | 2011-11-23 | 南京邮电大学 | Multi-user dynamic spectrum accessing method based on distributed independent learning |
CN103220751A (en) * | 2013-05-08 | 2013-07-24 | 哈尔滨工业大学 | Heterogeneous network access control method based on Q learning resource allocation strategy |
CN107105455A (en) * | 2017-04-26 | 2017-08-29 | 重庆邮电大学 | It is a kind of that load-balancing method is accessed based on the user perceived from backhaul |
-
2018
- 2018-01-19 CN CN201810053320.1A patent/CN108347744B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101466111A (en) * | 2009-01-13 | 2009-06-24 | 中国人民解放军理工大学通信工程学院 | Dynamic spectrum access method based on policy planning constrain Q study |
CN102256262A (en) * | 2011-07-14 | 2011-11-23 | 南京邮电大学 | Multi-user dynamic spectrum accessing method based on distributed independent learning |
CN103220751A (en) * | 2013-05-08 | 2013-07-24 | 哈尔滨工业大学 | Heterogeneous network access control method based on Q learning resource allocation strategy |
CN107105455A (en) * | 2017-04-26 | 2017-08-29 | 重庆邮电大学 | It is a kind of that load-balancing method is accessed based on the user perceived from backhaul |
Non-Patent Citations (2)
Title |
---|
刘惠茹等: "基于Q学习的CDMA/WLAN异构网络接入控制算法", 《通信技术》 * |
赵彪等: "Q学习算法在机会频谱接入信道选择中的应用", 《信号处理》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108966330A (en) * | 2018-09-21 | 2018-12-07 | 西北大学 | A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning |
CN114503622A (en) * | 2019-09-27 | 2022-05-13 | 瑞典爱立信有限公司 | Method and apparatus for access or RAT restriction |
CN113810883A (en) * | 2021-07-01 | 2021-12-17 | 中铁二院工程集团有限责任公司 | Internet of things large-scale random access control method |
Also Published As
Publication number | Publication date |
---|---|
CN108347744B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111556461B (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
CN108347744A (en) | A kind of equipment cut-in method, device and access control equipment | |
CN101313494B (en) | Radio network design apparatus and method | |
CN107766135A (en) | Method for allocating tasks based on population and simulated annealing optimization in mobile cloudlet | |
CN107911478A (en) | Multi-user based on chemical reaction optimization algorithm calculates discharging method and device | |
CN104619029B (en) | It is a kind of centralization cellular network architecture under baseband pool resource allocation methods and device | |
CN106227599B (en) | The method and system of scheduling of resource in a kind of cloud computing system | |
JP2013026980A (en) | Parameter setting device, computer program, and parameter setting method | |
CN112533237B (en) | Network capacity optimization method for supporting large-scale equipment communication in industrial internet | |
CN115718956B (en) | Antenna layout method, device, medium and system | |
CN103703830B (en) | A kind of physical resource adjustment, device and controller | |
KR20230007941A (en) | Edge computational task offloading scheme using reinforcement learning for IIoT scenario | |
CN113343437B (en) | Electric automobile quick charge guiding method, system, terminal and medium | |
CN113950134A (en) | Dormancy prediction method, device, equipment and computer readable storage medium for base station | |
Rodoshi et al. | Deep reinforcement learning based dynamic resource allocation in cloud radio access networks | |
WO2023226183A1 (en) | Multi-base-station queuing type preamble allocation method based on multi-agent collaboration | |
CN103501509A (en) | Method and device for balancing loads of radio network controller | |
CN109982246B (en) | Method, device and medium for adjusting power of cellular cell | |
Shao et al. | A Load Balancing Vertical Handoff Algorithm Considering QoS of Users for Heterogeneous Networks in Power Communication | |
CN111290853B (en) | Cloud data center scheduling method based on self-adaptive improved genetic algorithm | |
CN104967638B (en) | The distribution method of a kind of back end and system | |
CN103906197A (en) | Decision-making method for multi-radio access selection of cognitive radio network | |
CN112584386A (en) | 5G C-RAN resource prediction and allocation method and system | |
CN109038609B (en) | Reactive power optimization method and system for power system | |
CN113296893A (en) | Cloud platform low-resource-loss virtual machine placement method based on hybrid sine and cosine particle swarm optimization algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210901 Address after: Siming District of Xiamen city in Fujian Province, 361000 South Siming Road No. 422 Patentee after: XIAMEN University Patentee after: Jingxin Network System Co.,Ltd. Address before: Siming District of Xiamen city in Fujian Province, 361000 South Siming Road No. 422 Patentee before: XIAMEN University Patentee before: COMBA TELECOM SYSTEMS (GUANGZHOU) Ltd. |
|
TR01 | Transfer of patent right |