CN108347744A - A kind of equipment cut-in method, device and access control equipment - Google Patents

A kind of equipment cut-in method, device and access control equipment Download PDF

Info

Publication number
CN108347744A
CN108347744A CN201810053320.1A CN201810053320A CN108347744A CN 108347744 A CN108347744 A CN 108347744A CN 201810053320 A CN201810053320 A CN 201810053320A CN 108347744 A CN108347744 A CN 108347744A
Authority
CN
China
Prior art keywords
values
access state
action
adjustment
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810053320.1A
Other languages
Chinese (zh)
Other versions
CN108347744B (en
Inventor
赵毅峰
刘凯
杨华裕
黄联芬
廖礼宇
李馨
张远见
胡应添
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Comba Network Systems Co Ltd
Original Assignee
Xiamen University
Comba Telecom Systems Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University, Comba Telecom Systems Guangzhou Co Ltd filed Critical Xiamen University
Priority to CN201810053320.1A priority Critical patent/CN108347744B/en
Publication of CN108347744A publication Critical patent/CN108347744A/en
Application granted granted Critical
Publication of CN108347744B publication Critical patent/CN108347744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0289Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W48/00Access restriction; Network selection; Access point selection
    • H04W48/02Access restriction performed under specific conditions
    • H04W48/06Access restriction performed under specific conditions based on traffic conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/70Services for machine-to-machine communication [M2M] or machine type communication [MTC]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the invention discloses a kind of equipment cut-in method, device and access control equipment, wherein method includes:Obtain the apparatus access state of base station, if it is determined that apparatus access state is congestion state, access restriction parameter P values adjustment process is then executed, the P values obtained after each P values are adjusted are sent to each equipment under base station, until the apparatus access state of base station becomes not congestion state.So, when under the apparatus access state that base station is in congestion, by executing P values, the Congestion Level SPCC of base station can be effectively improved by adjusting P values with adjusting process dynamics, and due to including the experience of previous adjustment P values in intensified learning Q value matrixs, therefore determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values according to the Q values in intensified learning Q value matrixs during the adjustment of P values, to make base station quickly reach best access state.

Description

A kind of equipment cut-in method, device and access control equipment
Technical field
The present invention relates to a kind of wireless communication technology field more particularly to equipment cut-in method, device and access controls to set It is standby.
Background technology
Development with from modern communication technology to Internet of Things, MTC (machine type communication, machine type Communication) equipment largely increases severely, and quantity has been more than H2H (Human-to-Human, person to person's communication) equipment.These MTC are set Standby communication be smart city, intelligent grid development basis, but also existing cordless communication network is caused prodigious negative It carries.The access of MTC device can cause existing H2H equipment and to the stringenter M2M (Machine-to- of delay requirement Machine, inter-machine communication) time delay of equipment increases, and base station control can be caused single especially when access amount is very big, when serious Member cannot timely and effectively handle mass data, cause base station that cannot normally work for a long time.
In order to solve the random access congestion problems of MTC device, 3GPP (3rd Generation Partnership Project, third generation partner program) group is woven in TS22.011V11.0.0. (chapter 4.3.4) and proposes ACB (Access Class Barring, access style limitation) algorithm, ACB define 16 kinds of access styles, some of them access class Type is that the application of high priority retains.When network load is heavier, base station can using ACB parameters as a part for system information to Device broadcasts in cell, these parameters include the access probability and back off time of different access styles.The principle of ACB is base station An access level restriction parameter P (0≤P≤1) is set according to current network load situation.Each equipment is produced before a random access Random number between a raw 0-1, equipment random number are less than access level restriction parameter P values, carry out random access, if more than with Machine number then continues generation random number in subsequent time and is accessed again.So when large number of equipment pours in access, it can make net Network loading condition is eased, and is successfully accessed rate and handling capacity is optimized.However, existing access level restriction algorithm is dividing In class feedback procedure, processing data are more complicated, when large number of equipment pours in, it cannot be guaranteed that access network is in optimal shape State, and since the less adjustment for leading to P values of priori of the process of network adjustment access level restriction parameter P does not tend to It is optimal, convergence rate is slow, and make base station global access state it is poor, cannot be satisfied the delay requirement of each equipment.
In conclusion there is an urgent need for a kind of equipment cut-in methods at present, can not be moved to solve existing radio access technology State adjusts P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.
Invention content
A kind of equipment cut-in method of present invention offer, device and access control equipment, to solve existing random access Technology can not dynamically adjust P values so that the access state of base station is poor, the technical issues of being easy to happen congestion.
A kind of equipment cut-in method provided in an embodiment of the present invention, including:
Obtain the apparatus access state of base station;
If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and will be every The P values obtained after secondary P values adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not Congestion state;
Wherein, the action that each P values adjustment uses is determined according to the Q values in intensified learning Q value matrixs.
Optionally, the apparatus access state for obtaining base station, including:
According to the current application access device number in base station and optimal access device number, the equipment access shape of the base station is determined State.
Optionally, the execution P values adjust process, and the P values obtained after each P values are adjusted are sent under the base station Each equipment, including:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) the P values are adjusted, and the P values after adjustment are sent to each equipment;
Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and shape is accessed according to the equipment State SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkLower use The action Yk(l) corresponding Q values.
Optionally, the apparatus access state S according to the preceding base station of kth time P value adjustmentkWith the intensified learning Q Q values in value matrix determine the action Y that the kth time P value adjustment usesk(l), including:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, described according to the apparatus access state SkWith the apparatus access state Sk+1, update the extensive chemical It practises in Q value matrixs in the apparatus access state SkThe action Y of lower usek(l) corresponding Q values, including:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk It is lower to use the action Yk(l) corresponding Q values.
Optionally, according to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1It is corresponding Maximum Q values, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the equipment connects after being updated Enter state SkIt is lower to use the action Yk(l) corresponding Q values, meet following formula:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, in the apparatus access state S in the update intensified learning Q value matrixskIt is lower to be moved using described Make Yk(l) after corresponding Q values, further include:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment Apparatus access state.
Optionally it is determined that in preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment accesses shape State S1The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Based on same inventive concept, the present invention also provides a kind of equipment access devices, including:
Acquisition module, the apparatus access state for obtaining base station;
Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P value tune It is had suffered journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until The apparatus access state of the base station is not congestion state;Wherein, the action that each P values adjustment uses is according to extensive chemical Practise what the Q values in Q value matrixs determined.
Optionally, the acquisition module is specifically used for:
It is connect according to the current application access device number in the base station and optimal access device number, the equipment for determining the base station Enter state.
Optionally, the processing module is specifically used for:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module described each A equipment;
The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition modulek+1, and root According to the apparatus access state SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the equipment Access state SkThe action Y of lower usek(l) corresponding Q values.
Optionally, the processing module is specifically additionally operable to:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, the processing module is specifically additionally operable to:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk It is lower to use the action Yk(l) corresponding Q values.
Optionally, the processing module is specifically additionally operable to the apparatus access state S after being updated by following formulak It is lower to use the action Yk(l) corresponding Q values:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, the processing module is additionally operable to:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment Apparatus access state.
Optionally, the processing module is specifically additionally operable to:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Phase The nearest values of the P three times adjustment of the same and relatively described kth time P values adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Another embodiment of the present invention provides a kind of access control equipment comprising memory and processor, wherein described Memory is for storing program instruction, and the processor is for calling the program instruction stored in the memory, according to acquisition Program execute any of the above-described kind of method.
Another embodiment of the present invention provides a kind of computer storage media, and the computer-readable recording medium storage has Computer executable instructions, the computer executable instructions are for making the computer execute any of the above-described kind of method.
Equipment cut-in method provided in an embodiment of the present invention includes the apparatus access state for obtaining base station, however, it is determined that described Apparatus access state is congestion state, then executes access restriction parameter P values adjustment process, the P values obtained after each P values are adjusted The each equipment being sent under the base station, until the apparatus access state of the base station becomes not congestion state.In this way, in base When standing under the access state in congestion, by executing P values, gathering around for base station can be effectively improved by adjusting P values with adjusting process dynamics Plug degree, and due to including the experience of previous adjustment P values, basis during the adjustment of P values in intensified learning Q value matrixs Q values in intensified learning Q value matrixs determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values, to make Base station quickly reaches best equipment access state.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without having to pay creative labor, it can also be obtained according to these attached drawings His attached drawing.
Fig. 1 is the system architecture diagram that equipment cut-in method provided in an embodiment of the present invention is applicable in;
Fig. 2 is the flow diagram corresponding to a kind of equipment cut-in method provided in an embodiment of the present invention;
Fig. 3 is the flow diagram corresponding to the method for the access state of the determination base station provided in the embodiment of the present invention;
Fig. 4 is the resource parameters schematic diagram of base station provided in an embodiment of the present invention;
Fig. 5 is contention access number of devices and the relational graph being successfully accessed between probability in the embodiment of the present invention;
Fig. 6 is contention access number of devices and the relational graph being successfully accessed between number of devices in the embodiment of the present invention;
Fig. 7 is the schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of the state transfer gain matrix provided in the embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of equipment access device provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of access control equipment provided in an embodiment of the present invention.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into One step it is described in detail, it is clear that described embodiment, only a part of the embodiment of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The embodiment of the present invention is described in further detail with reference to the accompanying drawings of the specification.
The equipment cut-in method provided in the embodiment of the present invention is applied in the random access field of equipment for machine type communication.Figure 1 system architecture diagram being applicable in for equipment cut-in method provided in an embodiment of the present invention, connects as shown in Figure 1, system includes equipment Enter device 101, base station 102 and a plurality of types of equipment for machine type communication (as illustrated in the drawing 103 to 110).
Wherein, the base station can be 2G, 3G, 4G, LTE-M (Long Term Evolution Machine to Machine, the technology of Internet of things based on long term evolution), NB-IOT (Narrow Band Internet of Things, narrowband Internet of Things) etc. multiple types communication system or the base station in Internet of things system, the present invention this is not particularly limited.
The equipment access device can be independently of the smart machine of base station setting, the wireless communication of the device and base station Unit (such as RRU (Radio Remote Unit, Remote Radio Unit)) foundation has communication connection, is obtained by the communication connection The resource and access situation of base station.Alternatively, the equipment access device is also embedded in the RRU of inside of base station, directly acquire The resource and access situation, the present invention of base station are not particularly limited this.
The equipment for machine type communication is such as smart mobile phone, tablet computer communication equipment, can also be as intellectual water meter, A plurality of types of internet of things equipment, the invention such as ammeter, parking management module are not particularly limited this, these machine type communications It is provided with wireless communication module in equipment, passes through wireless network and base station communication.
Fig. 2 shows the flow diagrams corresponding to a kind of equipment cut-in method provided in the embodiment of the present invention, this sets Standby cut-in method is specifically executed by the equipment access device in above system framework, as shown in Fig. 2, the method includes:
Step S201:Obtain the apparatus access state of base station;
Step S202:If it is determined that the apparatus access state is congestion state, then executes access restriction parameter P values and adjusted Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station, until the access shape of the base station State becomes not congestion state;Wherein, the action that each P values adjustment uses is true according to the Q values in intensified learning Q value matrixs Fixed.
In this way, when under the apparatus access state that base station is in congestion, P is adjusted with adjusting process dynamics by executing P values Value can effectively improve the Congestion Level SPCC of base station, and due to including the warp of previous adjustment P values in intensified learning Q value matrixs It tests, therefore determines that the action used in adjustment P values every time can be effective according to the Q values in intensified learning Q value matrixs during the adjustment of P values The convergence rate for improving P values, to make base station quickly reach best access state, overall efficiency higher.
In step s 201, equipment access device can obtain base station by the communication connection of the wireless communication unit with base station Current access situation and access capability, so that it is determined that going out apparatus access state.
Specifically, the apparatus access state of base station is true according to current application access device number and optimal access device number Fixed, wherein application access device number is practical contention access number of devices in current system, optimal access device number is to be currently Optimal contention access number of devices in system, as shown in figure 3, determining that apparatus access state may include following step S301 to step S303:
Step S301:Obtain current application access device number and optimal access device number;
Optimal access device number, i.e. optimal contention access equipment base station are practical to access the number of devices competed when maximum device number Amount, the access capability current for characterizing base station.The optimal access device number is equipment access device according to the current money in base station What source situation determined.Fig. 4 is the schematic diagram of the resource parameters of base station provided in an embodiment of the present invention, as shown in figure 4, the money of base station Source situation includes following any one or more resource parameters:
Can with leading number of codes, the maximum retransmission of lead code, the access request arrival rate of each equipment, connect at random Input time slot assignment period, backoff parameter, random access response length of window.
In the embodiment of the present invention, equipment access device can determine current practical competition by the communication connection with base station The current resource situation of access device number and base station.Then, by carrying out emulation mould to the current resource situation in base station It is quasi-, obtain current optimal contention access number of devices.
It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 5 and is successfully accessed between probability Relational graph is successfully accessed probability and is maintained at 1 as shown in figure 5, when applying for that access device number is less than maximum contention access device number It is constant, with the increase of practical contention access number of devices, when practical contention access number of devices is more than or equal to maximum contention number of devices When, it is successfully accessed probability and reduces rapidly.It is equal to the turning point of maximum contention access device number in practical contention access number of devices When on position, the change rate for being successfully accessed probability is maximum, and with the increase of practical contention access number of devices, change rate gradually drops It is low, and finally level off to 0.
It the exemplary practical contention access number of devices given in the embodiment of the present invention of Fig. 6 and is successfully accessed between number of devices Relational graph, as shown in fig. 6, practical contention access number of devices be less than maximum contention access device number when, be successfully accessed equipment Number linearly increases with the increase of practical contention access number of devices, with the increase of practical contention access number of devices, works as reality After contention access number of devices is more than or equal to maximum contention access device number, number of devices is successfully accessed with practical contention access equipment Several increase and reduce, and be successfully accessed number of devices and be in maximum contention access device number in practical contention access number of devices When near turning point, fall is maximum.
In conjunction with Fig. 5 and Fig. 6, the optimal contention access number of devices in the embodiment of the present invention is more than maximum contention access device Number is the practical contention access number of devices that base station is successfully accessed in the case where setting is successfully accessed rate, and it is smaller which is successfully accessed rate In 1.That is, optimal contention access number of devices and maximum contention access device number are numerically very close to being in Fig. 5 and figure Contention access number of devices is attached close to turning point more than in the sloping portion after maximum contention access device number in curve shown in 6 Close a certain concrete numerical value.
In the embodiment of the present invention, setting is successfully accessed rate and can be specifically arranged by those skilled in the art, the present invention to this not Do concrete restriction.It is alternatively possible to which setting success rate is taken 98.9% or other close numerical value.
Step S302:According to practical contention access number of devices and optimal contention access number of devices, Congestion Level SPCC is calculated Value.
In the embodiment of the present invention, congestion degree value can be calculated by following formula one:
Wherein, P is congestion degree value, and N is practical contention access number of devices, N0For optimal contention access number of devices.
According to formula one as can be seen that congestion degree value can regard that practical contention access number of devices deviates optimal competition as The weighted value of access device number, when practical contention access number of devices is less than optimal contention access number of devices, base station is not gathered around Plug, it is 1 to be successfully accessed probability, and the equipment of all application accesses can be successfully accessed base station, thus, it is calculated by above-mentioned formula one Obtained congestion degree value is negative value.And when practical contention access number of devices is more than optimal contention access number of devices, base station exists A degree of congestion is successfully accessed probability and is less than or equal to 1, since the number of devices of application access has been more than the access energy of base station Power, thus, can have a certain number of equipment can not be successfully accessed, and the congestion degree value being calculated by above-mentioned formula one is Positive value.
Step S303:The congestion degree value section belonging to the congestion degree value is determined, according to the congestion degree value area Between and the correspondence between each congestion degree value section and apparatus access state, determine the base station equipment access State.
All possible apparatus access state is preset in the embodiment of the present invention, in base station, each equipment accesses shape State corresponds to a congestion degree value section.Each congestion degree value in the same congestion degree value section due to mutually it Between gap it is smaller, it is believed that be belong to the same congestion level, and an apparatus access state of base station correspond to a congestion Rank, thus, it can judge the current congestion level in base station according to the congestion degree value being calculated by above-mentioned formula one, from And determine the apparatus access state of base station.
If for example, all possible congestion level in base station be [0, L], l ∈ [0, L], wherein L indicate base station highest gather around Rank is filled in, l is some congestion level therein, and l and L are the positive integer more than or equal to 1.Each congestion level is corresponding One apparatus access state of base station indicates base station not congestion, i.e., corresponding apparatus access state when congestion level is 0 Not congestion state is also belonged to, when congestion level is more than 0, indicates that congestion, corresponding apparatus access state category has occurred in base station In congestion state.
By above-mentioned formula one it can also be seen that the congestion degree value being calculated is continuous numerical value, and each congestion grade It is other, it is positive integer, therefore, preset each congestion degree value in base station and apparatus access state (i.e. congestion grade can be passed through Correspondence between not) quantifies the congestion degree value being calculated, so that it is determined that going out the current equipment access in base station State.
In this way, base station has altogether L+1 kind congestion levels, the i.e. possible apparatus access state of L+1 kinds, the embodiment of the present invention In, those skilled in the art can according to actual needs set the quantity of apparatus access state (i.e. the quantity of congestion level) It sets, the present invention is not particularly limited this.
In addition, in the embodiment of the present invention, when congestion level is 0, corresponding congestion degree value section is [- ∞, 0], and its The range in the corresponding congestion degree value section of his congestion level can be specifically arranged by those skilled in the art, and the present invention does not do this Concrete restriction.For example, the corresponding congestion degree value section of the congestion level of each non-zero can be set to the area of size equalization Between, or can also to set the corresponding congestion degree value section of the congestion level of each non-zero to size unequal, but with The increase section size of congestion level also gradually smaller section, i.e. congestion level is higher, corresponding congestion degree value section Range is with regard to smaller.
In the embodiment of the present invention, P values when equipment access device is only in base station the apparatus access state of congestion carry out Adjustment, therefore, in step s 102, however, it is determined that apparatus access state is congestion state, then triggers and execute access restriction parameter P values Adjustment process.And if it is determined that apparatus access state is not congestion state, then current P values are not adjusted.Wherein, above-mentioned congestion shape State refers to the access state that corresponding congestion level belongs to [1, L], without congestion state refer to corresponding congestion level is 0 to set Standby access state.
Since different moments apply for that the number of devices of access base station may be different, the access state of base station may be real-time change Change, therefore, the equipment access device in the embodiment of the present invention can periodically go to obtain the access state of base station, thus When determining that the access state of base station is congestion, triggering P values adjust process.
In the embodiment of the present invention, equipment access device can determine that current apparatus access state is congestion state Afterwards, the P values for just going to acquisition base station current can also be equipment access device by the communication connection with base station, determine that base station is worked as It is got while preceding apparatus access state, the present invention is not particularly limited this.
Specifically, above-mentioned P values adjustment process includes multiple P values adjustment, each P values adjustment can regard whole P values adjustment as One cycle in the process.The P values obtained after adjustment are sent to each equipment of access base station, directly in each P values adjustment Apparatus access state to base station becomes not congestion state.
The flow diagram of the exemplary P values adjustment processes given in the embodiment of the present invention of Fig. 7, as shown in fig. 7, P values Adjustment process specifically comprises the following steps S701 to step S704:
Step S701:According to the apparatus access state S of the base station before kth time P value adjustmentkWith the intensified learning Q values Q values in matrix determine the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Step S702:Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to described each set It is standby;
Step S703:Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and set according to described Standby access state SkWith the apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkThe action Y of lower usek(l) corresponding Q values;
Step S704:If it is determined that apparatus access state Sk+1For not congestion state, then terminates P values adjustment process, otherwise carry out + 1 P values adjustment of kth.
The embodiment of the present invention uses the thought of Q study to adjust P values, and P values adjust process and include repeatedly cycle for one Adjustment process.It is adjusted for a P value in cycle each time, the multiple P values in the secondary P values adjustment before basis adjust middle school The experience practised select this P value adjust change P values use action.
Intensified learning Q value matrixs and a behavior aggregate for being used for adjusting P values there are one being pre-set in equipment access device It closes.Wherein, intensified learning Q value matrixs are used for recording the experience learnt from the cycle of previous adjustment P values, intensified learning Q One apparatus access state of each line identifier base station in value matrix, each row mark base station adjust in the set of actions of P values One action, in intensified learning Q value matrixs the i-th row jth row in Q values indicate in state SiIn lower selection set of actions J-th of action YjAdjust the corresponding Q values of P values.The higher expression of Q values is in state SiSelection acts YjReach final goal (i.e. not congestion State) success rate is higher, i.e. and selection acts YjThe benefit higher of whole system afterwards.Intensified learning Q value matrixs are in above-mentioned P values tune It has suffered and is initialised 0 when journey starts to execute (i.e. first time P values adjustment before), often pass through a P values adjustment later, it all can be correspondingly Adjust Q values therein.
The set of actions for being used for adjusting P values in base station can be { Y (- H) ... Y (- 1), Y (0), Y (1) ..., Y (H) } Form, wherein each action corresponds to a P value adjustment amount, which can be positive number, negative, or 0.It can See, selection increase P values action be adjusted after can allow more equipment access, conversely, selection reduce P values action adjusted The quantity of the equipment of access can be reduced after whole.
In the embodiment of the present invention, those skilled in the art can be to the P value adjustment amounts corresponding to each action in set of actions Specifically it is arranged, the present invention is not particularly limited this.For example, the unit that P values can be adjusted be defined as 0.01, it is specified that It acts Y (1) to indicate P values increasing 0.01, action Y (- 1) indicates P values reducing 0.01, and so on.
It is clear, easy in order to describe, (i.e. kth time P values adjustment) is only adjusted with any secondary P values therein to introduce below Entire P values adjust process.
In the adjustment of P values each time, equipment access device can all adjust a P value, and the P values after adjustment are handed down to base Each equipment under standing can determine itself whether apply accessing after equipment receives the P values after adjustment, therefore, often through once following The apparatus access state of ring, base station also can correspondingly change.
In step s 701, in kth time P value adjustment, equipment access device adjusts setting for preceding base station according to kth time P values Standby access state SkWith the Q values in intensified learning Q value matrixs, the action Y that kth time P value adjustment uses is determinedk(l).Wherein, k=1 When, apparatus access state S1The apparatus access state obtained in as step S201 triggers the equipment access of P values adjustment process State and first time P value adjust the apparatus access state of preceding base station.
It is specific as follows:If intensified learning Q value matrixs do not have relative device access state S1Convergence determines then according to the k Go out and selects apparatus access state S in preset each optional actionkThe probability of the corresponding maximum action of Q values.Then, according to The probability determined and preset each optional action determine action Yk(l)。
Wherein, k refers to which time P value adjustment current kth time P values adjustment is during the adjustment of entire P values.
In the embodiment of the present invention, the optional action all same under all possible apparatus access state of base station.Therefore, on State apparatus access state SkIt is each it is optional action be base station adjustment P values set of actions in each action.Certainly, ability Field technique personnel also can be specifically arranged corresponding optional action under the possible apparatus access state in base station, each access The corresponding optional action of state can be different due to the difference of its congestion level, and the present invention is not particularly limited this.
In the embodiment of the present invention, since intensified learning Q value matrixs are initialized to 0 when P values adjustment process starts, and it is rigid Intensified learning Q value matrixs after initialization can not be how the 1st P values adjustment selects action to be instructed, thus, the 1st An action in secondary P values adjustment in equipment access device random selection set of actions, which is used as, to be started, and is triggered the P values and was adjusted Journey.
Then, in order to obtain the experience of more adjustment P values, equipment access device will be as much as possible to each in set of actions A action carries out trial and error, thus, in P values adjustment later, equipment access device can still be randomly selected according to certain probability Action is chosen at the apparatus access state S of P values adjustment according to certain probabilitykThe lower maximum action of Q values.It is gradual with k Increase, the P values adjustment experience for including in intensified learning Q value matrixs is more and more abundant more, thus it is maximum gradually to increase selection Q values That action probability, the corresponding probability for reducing random selection action.
As it can be seen that acting Y in selection in the embodiment of the present inventionk(l) when, by apparatus access state SkEach optional action in Q It is worth maximum action as action Yk(l) the k value positive correlations of probability and kth time P value adjustment, equipment access device can be in determination After the probability for going out to select the maximum action of Q values, in conjunction with apparatus access state SkEach optional action, finally determine action Yk(l)。
If intensified learning Q value matrixs relative device access state S1Convergence, then it represents that in multiple P values adjustment before In learnt to by apparatus access state S1It is adjusted to the optimal policy of not congestion state, then, in any secondary P later It will all be selected in preset each optional action in value adjustment, it is maximum to correspond to Q values for apparatus access state before any secondary P values adjustment That action, as the action used in any secondary P values adjustment.
In step S702, equipment access device is based on the action Y selected in kth time P value adjustmentk(l) P values are adjusted, The P values obtained after adjustment are equal to the current P values in base station and add selected action Yk(l) corresponding P values adjustment amount, then will adjustment P values afterwards are sent to each equipment under base station.
After equipment receives the P values after the adjustment of base station, method in the prior art can be used and judge whether application access base It stands.I.e. equipment itself randomly generates a P value, and the P values that itself is generated are compared with the P values that base station issues, if itself The P values of generation are less than or equal to the P values that base station issues, then send application access request to base station, otherwise do not send application access and ask It asks, after waiting for a period of time, judges whether to apply for access base station again.
In step S703, equipment access device can be accessed according to the acquisition base station equipment described in above-mentioned steps S201 The method of state obtains the apparatus access state S of base stationk+1, apparatus access state Sk+1Refer to being adopted in kth time P value adjustment With action Yk(l) after adjusting P values, the apparatus access state of base station is changed, the new apparatus access state that base station reaches. If apparatus access state Sk+1It is still congestion state, indicates that P values adjustment process continues, in this case, the equipment Access state Sk+1And+1 P value of kth adjusts the apparatus access state of preceding base station.
Get the apparatus access state S of base stationk+1Afterwards, equipment access device can also be according to apparatus access state SkWith set Standby access state Sk+1, update in intensified learning Q value matrixs in apparatus access state SkIt is lower to use action Yk(l) corresponding Q values, tool Body includes:
It determines first in apparatus access state SkLower selection acts Yk(l) corresponding transfer gain.In the embodiment of the present invention, Equipment access device can determine the transfer gain according to preset transfer gain function, or can also be according to institute in Fig. 8 The transfer gain matrix shown determines that the transfer gain, the present invention are not particularly limited this.
By taking the transfer gain matrix in Fig. 8 as an example, each line identifier base station of the transfer gain is before kth time P value adjustment Apparatus access state Sk, each row mark is using action Yk(l) the apparatus access state S that base station reaches after adjustment P valuesk+1, and Numerical value in i-th row jth row indicates the apparatus access state of base station by SiIt is transferred to state SjCorresponding transfer gain.
As seen from Figure 8, it is corresponding when the apparatus access state of base station is transferred to high congestion rank by low congestion level Transfer gain is negative value, and the more high corresponding transfer gain of the congestion level for the apparatus access state being transferred to is smaller.Base station Apparatus access state when being transferred to low congestion level by high congestion rank corresponding transfer gain be positive value, and the equipment being transferred to The more low corresponding transfer gain of the congestion level of access state is bigger.If the apparatus access state of base station is constant after the adjustment of P values, Corresponding transfer gain is zero.
Then, Y is acted according to selectionk(l) transfer gain caused by, apparatus access state S in intensified learning Q value matrixsk +1Corresponding maximum Q values, apparatus access state SkUsing the action Yk(l) corresponding Q values Q values are calculated by following formula Apparatus access state S after being updatedkThe corresponding action Yk(l) Q values:
Wherein, SkThe apparatus access state of preceding base station, Y are adjusted for kth time P valuesk(l) it is to be selected in kth time P value adjustment First action;Q(Sk,Yk(l)) it is in apparatus access state SkIt is lower to use action Yk(l) corresponding Q values, and the Q on the right of equal sign (Sk,Yk(l)) apparatus access state S before representing matrix updatekRespective action Yk(l) Q values, the Q (S on the equal sign left sidek,Yk (l)) apparatus access state S after representing matrix updatekRespective action Yk(l) Q values;α is Studying factors, it is value range Real number between (0,1), α is bigger, and the effect for indicating that reservation is trained before is fewer, more payes attention in current kth time P value adjustment Middle selection acts Yk(l) caused return afterwards, i.e. selection act Yk(l) what is reached after transfer gain caused by and update sets Standby access state Sk+1Q value of the Q values after matrix update in the ratio that accounts for it is higher, conversely, the smaller expressions of α are more paid attention to before The adjustment of P values in the experience that learns, that is, the ratio accounted in the Q values of Q values in the updated before updating is higher;Rs(Sk,Yk(l)) For in apparatus access state SkIt is lower to use action Yk(l) corresponding transfer gain, γ are discount factor, and 0<γ<1, γ is bigger, More pay attention to experience, Sk+1For the new apparatus access state that base station reaches after kth time P value adjustment, Y (l) is in intensified learning Q Apparatus access state S in value matrixk+1Each optional action in corresponding maximum Q values action,For Apparatus access state Sk+1Corresponding maximum Q values, i.e., in apparatus access state Sk+1The Q values of lower respective action Y (l).
In the embodiment of the present invention, those skilled in the art can be according to above-mentioned study in formula of the actual demand to calculating Q values The factor and the concrete numerical value of discount factor are reasonably arranged, and the present invention is not particularly limited this.
Due to apparatus access state S it can be seen from Fig. 8 combinations above-mentioned formula twokFor congestion state, if the action of selection Yk(l) it is the action for increasing P values, then base station will allow more equipment application access base stations, in this case, base station Apparatus access state deviates the case where not congestion state can be further serious, that is, adjusts the new apparatus access state S after P valuesk+1's Congestion level can be got higher, and the transfer gain brought at this time is negative value, is equivalent to punishment;Correspondingly, if the action Y of selectionk(l) it is Reduce the action of P values, then base station will reduce the number of devices for allowing to apply for access base station, in this case, base station is set The case where standby access state deviates not congestion state can mitigate, that is, adjust the new apparatus access state S after P valuesk+1Congestion grade It is not lower, the transfer gain brought at this time is positive value, is equivalent to reward.
In step S704, if using selected action Y through kth time P value adjustmentk(l) after adjusting P values, obtained base Stand the new access state S reachedk+1Be still congestion state, then continue kth+1 P value adjustment, otherwise, indicate incited somebody to action Apparatus access state S when the apparatus access state of base station is by P values adjustment process1It is adjusted to not congestion state, at this time Cycle is jumped out, terminates P values and adjusts process.
It should be noted that after updating intensified learning Q value matrixs in step S703, but start executing step S704 Further include judging whether updated intensified learning Q value matrixs are convergent relative to the first access state recycle next time before Step.
As it was noted above, if intensified learning Q value matrixs are relative to apparatus access state S1Convergence is indicated equipment Access state S1The optimal policy for being adjusted to not congestion state has been found, and each Q values in intensified learning Q value matrixs have become It is close to stablize, therefore, select any secondary P values to adjust setting for preceding base station in any secondary P values adjustment after kth time P value adjustment Action of the maximum action of Q values as adjustment P values in each optional action of standby access state.
Specifically, whether intensified learning Q values connect relative to equipment after equipment access device judges update in the following way Enter state S1Convergence:
First, before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1The nearest values of the P three times adjustment of the identical and relatively described kth time P values adjustment time;
Then, judge whether the action that this is used during P values adjust three times meets the following preset condition of convergence, if met The condition of convergence, then when illustrating to be transferred to new apparatus access state every time, amplitude of variation all very littles of selected action (are followed for each time The P value adjustment amounts of selected action are all closer in ring), at this time, it is believed that each Q values in intensified learning Q value matrixs are It is basically unchanged, updated intensified learning Q value matrixs are relative to apparatus access state S1Convergence.
The preset condition of convergence meets following formula:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
As can be seen that due to judging intensified learning Q value matrixs whether relative to apparatus access state S1Convergence needs at least Apparatus access state is apparatus access state S before 3 P values adjustment of experience1Cycle, thus, P values adjustment process 3 times In cycle, intensified learning Q value matrixs must be relative to apparatus access state S1It does not restrain.
Moreover, because determining intensified learning Q value matrixs relative to apparatus access state S1Each P values adjustment before convergence In, it can all judge whether to meet the condition of convergence after selection action or update reinforcing Q value matrixs, therefore, if in kth time P value adjustment Middle first time determines that intensified learning Q value matrixs meet convergence addressee, then the apparatus access state before the kth time P value adjustment One is set to apparatus access state S1.In i.e. preceding k P values adjustment, P values adjust the apparatus access state of preceding base station and the equipment connects Enter state S1The nearest values of the P three times adjustment of the identical and relatively described kth time P value adjustment may include kth time P value adjustment.
Certainly it is also possible to have passed through P values adjustment many times, matrix is not restrained still in intensified learning Q, equipment access device It is also provided with a maximum receive number K, i.e., if being still unsatisfactory for the condition of convergence after kth time P value adjustment, but k values are More than or equal to maximum receive number K, then it is assumed that intensified learning Q value matrixs are relative to apparatus access state S1Convergence terminates Matrix training process, the maximum action of Q values in directly selecting intensified learning Q value matrixs in the values of P each time adjustment later P values are adjusted, are no longer acted with certain probability random selection.
Based on same inventive concept, the embodiment of the present invention also provides a kind of equipment access device, and Fig. 9 is that the present invention is implemented The structural schematic diagram of a kind of equipment access device provided in example, as shown in figure 9, the equipment access device 900 includes:
Acquisition module 901, the apparatus access state for obtaining base station;
Processing module 902, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P It is worth adjustment process, and obtained P values each are set by what transceiver module 903 was sent under the base station after each P values are adjusted It is standby, until the apparatus access state of the base station is not congestion state;Wherein, the action that each P values adjustment uses is root It is determined according to the Q values in intensified learning Q value matrixs.
Optionally, the acquisition module 901 is specifically used for:
It is connect according to the current application access device number in the base station and optimal access device number, the equipment for determining the base station Enter state.
Optionally, the processing module 902 is specifically used for:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q in the intensified learning Q value matrixs Value determines the action Y that the kth time P value adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) it adjusts the P values, and the P values after adjustment is sent to by transceiver module 903 described Each equipment;
The apparatus access state S of the base station after the kth time P value adjustment is obtained by the acquisition module 901k+1, and According to the apparatus access state SkWith the apparatus access state Sk+1, update and set described in the intensified learning Q value matrixs Standby access state SkThe action Y of lower usek(l) corresponding Q values.
Optionally, the processing module 902 is specifically additionally operable to:
Determination selects the apparatus access state S from preset optional actionkThe corresponding maximum action of Q values it is general Rate;Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
Optionally, the processing module 902 is specifically additionally operable to:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q Value, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedk It is lower to use the action Yk(l) corresponding Q values.
Optionally, the processing module 902 is specific is additionally operable to the equipment access shape after being updated by following formula State SkIt is lower to use the action Yk(l) corresponding Q values:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are Studying factors, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding turn Gain is moved, γ is discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1It is right The maximum action of Q values answered,For the apparatus access state Sk+1Corresponding maximum Q values.
Optionally, the processing module 902 is additionally operable to:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then in the kth time P value tune In any secondary P values adjustment after whole, setting before selecting any secondary P values adjustment in preset each optional action Standby access state corresponds to the maximum action of Q values, wherein the apparatus access state S1For the base station before the 1st P values adjustment Apparatus access state.
Optionally, the processing module 902 is specifically additionally operable to:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Phase The nearest values of the P three times adjustment of the same and relatively described kth time P values adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q Value matrix is relative to the apparatus access state S1Convergence.
Optionally, the condition of convergence specifically includes:
Wherein, Yn(l) it is the recent P values tune of kth time P value adjustment times described in distance in the adjustment of P values three times The action of whole use, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) For the action that the P value adjustment close apart from the kth time P value adjustment time thirds uses, ε is preset relatively threshold value, and ε>0.
Based on same inventive concept, the embodiment of the present invention also provides another access control equipment, and access control is set It is standby to be specifically as follows desktop computer, portable computer, smart mobile phone, tablet computer, personal digital assistant (Personal Digital Assistant, PDA) etc..As shown in Figure 10, which may include central processing unit (Center Processing Unit, CPU) 1001, memory 1002, input-output apparatus 1003 and bus system 1004 Deng.Wherein, input equipment may include keyboard, mouse, touch screen etc., and output equipment may include display equipment, such as liquid crystal Show device (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..
Memory may include read-only memory (ROM) and random access memory (RAM), and provide storage to processor The program instruction and data stored in device.In embodiments of the present invention, memory can be used for storing above equipment cut-in method Program.
Processor is by the program instruction for calling memory to store, and processor according to the program instruction of acquisition for executing State equipment cut-in method.
Based on same inventive concept, an embodiment of the present invention provides a kind of computer storage medias, for being stored as The computer program instructions used in access control equipment are stated, it includes the programs for executing above equipment cut-in method.
The computer storage media can be any usable medium or data storage device that computer can access, packet Include but be not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical memory (such as CD, DVD, BD, HVD etc.) and semiconductor memory (such as it is ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid State hard disk (SSD)) etc..
By the above it can be seen that:
Equipment cut-in method provided in an embodiment of the present invention includes the apparatus access state for obtaining base station, however, it is determined that described Apparatus access state is congestion state, then executes access restriction parameter P values adjustment process, the P values obtained after each P values are adjusted The each equipment being sent under the base station, until the apparatus access state of the base station becomes not congestion state.In this way, in base When standing under the access state in congestion, by executing P values, gathering around for base station can be effectively improved by adjusting P values with adjusting process dynamics Plug degree, and due to including the experience of previous adjustment P values, basis during the adjustment of P values in intensified learning Q value matrixs Q values in intensified learning Q value matrixs determine that the action used in adjustment P values every time can effectively improve the convergence rate of P values, to make Base station quickly reaches best equipment access state.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the meter of computer usable program code that the present invention, which can be used at one or more, The computer journey implemented in calculation machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of sequence product.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The function of being specified in present one flow of flow chart or more than two one box of flow and/or block diagram or more than two boxes Device.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to The manufacture of device is enabled, which realizes in one side of one flow of flow chart or more than two flows and/or block diagram The function of being specified in frame or more than two boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or more than two flows and/or box The step of function of being specified in one box of figure or more than two boxes.
Although the alternative embodiment of the present invention has been described, created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as include can It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (12)

1. a kind of equipment cut-in method, which is characterized in that the method includes:
Obtain the apparatus access state of base station;
If it is determined that the apparatus access state is congestion state, then access restriction parameter P values adjustment process is executed, and by each P The P values obtained after value adjustment are sent to each equipment under the base station, until the apparatus access state of the base station is not gather around Plug-like state;
Wherein, the action that each P values adjustment uses is determined according to the Q values in intensified learning Q value matrixs.
2. according to the method described in claim 1, it is characterized in that, it is described obtain base station apparatus access state, including:
According to the current application access device number in the base station and optimal access device number, the equipment access shape of the base station is determined State.
3. according to the method described in claim 1, it is characterized in that, the execution P values adjust process, and by each P values adjust The P values obtained afterwards are sent to each equipment under the base station, including:
According to the apparatus access state S of the base station before kth time P value adjustmentkWith the Q values in the intensified learning Q value matrixs, really The action Y that the fixed kth time P values adjustment usesk(l), wherein the k is positive integer;
Using the action Yk(l) the P values are adjusted, and the P values after adjustment are sent to each equipment;
Obtain the apparatus access state S of the base station after the kth time P values adjustk+1, and according to the apparatus access state SkWith The apparatus access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkLower use it is described Act Yk(l) corresponding Q values.
4. according to the method described in claim 3, it is characterized in that, the equipment according to the preceding base station of kth time P value adjustment Access state SkWith the Q values in the intensified learning Q value matrixs, the action Y that the kth time P value adjustment uses is determinedk(l), it wraps It includes:
Determination selects the apparatus access state S from preset each optional actionkThe probability of the corresponding maximum action of Q values; Wherein, the probability and the k positive correlations;
According to the probability and preset each optional action, the action Y is determinedk(l)。
5. according to the method described in claim 3, it is characterized in that, described according to the apparatus access state SkWith the equipment Access state Sk+1, update in the intensified learning Q value matrixs in the apparatus access state SkThe action Y of lower usek(l) Corresponding Q values, including:
It determines in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer gain;
According to apparatus access state S described in the transfer gain, the intensified learning Q value matrixsk+1Corresponding maximum Q values, institute State apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedkUnder adopt With the action Yk(l) corresponding Q values.
6. according to the method described in claim 5, it is characterized in that, according to the transfer gain, the intensified learning Q value matrixs Described in apparatus access state Sk+1Corresponding maximum Q values, the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, the apparatus access state S after being updatedkIt is lower to use the action Yk(l) corresponding Q values, meet following formula:
Wherein, Q (Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding Q values, α are study The factor, and 0<α<1, Rs(Sk,Yk(l)) it is in the apparatus access state SkIt is lower to use the action Yk(l) corresponding transfer increases Benefit, γ are discount factor, and 0<γ<1, Y (l) is apparatus access state S described in preset each optional actionk+1Corresponding Q It is worth maximum action,For the apparatus access state Sk+1Corresponding maximum Q values.
7. according to the method described in claim 3, it is characterized in that, described in the update intensified learning Q value matrixs Apparatus access state SkIt is lower to use the action Yk(l) after corresponding Q values, further include:
If it is determined that the intensified learning Q value matrixs are relative to apparatus access state S1Convergence, then after the kth time P value adjustment Any secondary P values adjustment in, the equipment access before selecting any secondary P values adjustment in preset each optional action State corresponds to the maximum action of Q values, wherein the apparatus access state S1Equipment for the base station before the 1st P values adjustment connects Enter state.
8. the method according to the description of claim 7 is characterized in that the determination intensified learning Q value matrixs are relative to setting Standby access state S1Convergence, including:
Before determining in k P values adjustment, P values adjust the apparatus access state of preceding base station and the apparatus access state S1Identical and phase The three times P value adjustment nearest to the kth time P value adjustment;
If it is determined that the action that the values of the P three times adjustment uses meets the preset condition of convergence, it is determined that the intensified learning Q value squares Battle array is relative to the apparatus access state S1Convergence.
9. according to the method described in claim 8, it is characterized in that, the condition of convergence specifically includes:
Wherein, Yn(l) it is that the recent P values adjustment of kth time P value adjustment times described in distance uses in the adjustment of P values three times Action, Yn-1(l) it is that the P value close apart from the kth time P values adjustment time second adjusts the action used, Yn-2(l) it is distance The action that the close P values adjustment of the kth time P value adjustment time thirds uses, ε are preset relatively threshold value, and ε>0.
10. a kind of equipment access device, which is characterized in that described device includes:
Acquisition module, the apparatus access state for obtaining base station;
Processing module, be used for if it is determined that the apparatus access state be congestion state, then execute access restriction parameter P values and adjusted Journey, and the P values obtained after each P values are adjusted are sent to each equipment under the base station by transceiver module, until described The apparatus access state of base station is not congestion state;Wherein, the action that each P values adjustment uses is according to intensified learning Q What the Q values in value matrix determined.
11. a kind of access control equipment, which is characterized in that including:
Memory, for storing program instruction;
Processor, for calling the program instruction stored in the memory, according to acquisition program execute as claim 1 to Method described in any one of 9.
12. a kind of computer storage media, which is characterized in that the computer-readable recording medium storage has computer executable Instruction, the computer executable instructions are for making the computer execute side as claimed in any one of claims 1-9 wherein Method.
CN201810053320.1A 2018-01-19 2018-01-19 Equipment access method, device and access control equipment Active CN108347744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810053320.1A CN108347744B (en) 2018-01-19 2018-01-19 Equipment access method, device and access control equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810053320.1A CN108347744B (en) 2018-01-19 2018-01-19 Equipment access method, device and access control equipment

Publications (2)

Publication Number Publication Date
CN108347744A true CN108347744A (en) 2018-07-31
CN108347744B CN108347744B (en) 2020-08-28

Family

ID=62961086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810053320.1A Active CN108347744B (en) 2018-01-19 2018-01-19 Equipment access method, device and access control equipment

Country Status (1)

Country Link
CN (1) CN108347744B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning
CN113810883A (en) * 2021-07-01 2021-12-17 中铁二院工程集团有限责任公司 Internet of things large-scale random access control method
CN114503622A (en) * 2019-09-27 2022-05-13 瑞典爱立信有限公司 Method and apparatus for access or RAT restriction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101466111A (en) * 2009-01-13 2009-06-24 中国人民解放军理工大学通信工程学院 Dynamic spectrum access method based on policy planning constrain Q study
CN102256262A (en) * 2011-07-14 2011-11-23 南京邮电大学 Multi-user dynamic spectrum accessing method based on distributed independent learning
CN103220751A (en) * 2013-05-08 2013-07-24 哈尔滨工业大学 Heterogeneous network access control method based on Q learning resource allocation strategy
CN107105455A (en) * 2017-04-26 2017-08-29 重庆邮电大学 It is a kind of that load-balancing method is accessed based on the user perceived from backhaul

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101466111A (en) * 2009-01-13 2009-06-24 中国人民解放军理工大学通信工程学院 Dynamic spectrum access method based on policy planning constrain Q study
CN102256262A (en) * 2011-07-14 2011-11-23 南京邮电大学 Multi-user dynamic spectrum accessing method based on distributed independent learning
CN103220751A (en) * 2013-05-08 2013-07-24 哈尔滨工业大学 Heterogeneous network access control method based on Q learning resource allocation strategy
CN107105455A (en) * 2017-04-26 2017-08-29 重庆邮电大学 It is a kind of that load-balancing method is accessed based on the user perceived from backhaul

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘惠茹等: "基于Q学习的CDMA/WLAN异构网络接入控制算法", 《通信技术》 *
赵彪等: "Q学习算法在机会频谱接入信道选择中的应用", 《信号处理》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning
CN114503622A (en) * 2019-09-27 2022-05-13 瑞典爱立信有限公司 Method and apparatus for access or RAT restriction
CN113810883A (en) * 2021-07-01 2021-12-17 中铁二院工程集团有限责任公司 Internet of things large-scale random access control method

Also Published As

Publication number Publication date
CN108347744B (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN108347744A (en) A kind of equipment cut-in method, device and access control equipment
CN101313494B (en) Radio network design apparatus and method
CN104158855B (en) Information Mobile Service combination based on genetic algorithm calculates discharging method
CN107766135A (en) Method for allocating tasks based on population and simulated annealing optimization in mobile cloudlet
CN101784061B (en) Method and device for realizing autonomous load balancing of wireless access network
CN112533237B (en) Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN110519849B (en) Communication and computing resource joint allocation method for mobile edge computing
CN106454857A (en) Evaluation method and device for network planning
CN115718956B (en) Antenna layout method, device, medium and system
CN103703830B (en) A kind of physical resource adjustment, device and controller
KR20230007941A (en) Edge computational task offloading scheme using reinforcement learning for IIoT scenario
CN102905317A (en) Mobile load balancing method used for multiple cells
CN106998340A (en) A kind of load-balancing method and device of board resource
CN113343437B (en) Electric automobile quick charge guiding method, system, terminal and medium
CN104811466A (en) Cloud media resource distribution method and device
CN101827446A (en) Radio bearer scheduling method and device
CN102480736B (en) Method and device for configuring dynamic data service channel
WO2023226183A1 (en) Multi-base-station queuing type preamble allocation method based on multi-agent collaboration
CN103501509A (en) Method and device for balancing loads of radio network controller
CN111290853B (en) Cloud data center scheduling method based on self-adaptive improved genetic algorithm
CN104967638B (en) The distribution method of a kind of back end and system
CN108073449A (en) A kind of virtual machine dynamic laying method
CN113950134A (en) Dormancy prediction method, device, equipment and computer readable storage medium for base station
CN103906197A (en) Decision-making method for multi-radio access selection of cognitive radio network
CN112584386A (en) 5G C-RAN resource prediction and allocation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210901

Address after: Siming District of Xiamen city in Fujian Province, 361000 South Siming Road No. 422

Patentee after: XIAMEN University

Patentee after: Jingxin Network System Co.,Ltd.

Address before: Siming District of Xiamen city in Fujian Province, 361000 South Siming Road No. 422

Patentee before: XIAMEN University

Patentee before: COMBA TELECOM SYSTEMS (GUANGZHOU) Ltd.

TR01 Transfer of patent right