CN114040321B

CN114040321B - Self-adaptive seamless switching method and system for hybrid network

Info

Publication number: CN114040321B
Application number: CN202111210515.0A
Authority: CN
Inventors: 张民; 吴劲涛; 韩大海; 刘鲲; 张会彬
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2023-03-24
Anticipated expiration: 2041-10-18
Also published as: CN114040321A

Abstract

The invention provides a self-adaptive seamless switching method and a system for a hybrid network, which comprise the following steps: acquiring the received signal strength between a user terminal and a wireless access point based on the current position of the user terminal; inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength; and after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission. According to the method and the device, the optimal access point in the hybrid network is obtained through the reinforcement learning model, and through the switching protocol without interrupting data transmission, the extra data overhead in the network selection switching process is avoided, the switching delay is reduced, and the experience quality of a user is improved.

Description

Self-adaptive seamless switching method and system for hybrid network

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and a system for adaptive seamless handover in a hybrid network.

Background

When the traditional cabin is subjected to wired communication layout, the problems of high layout difficulty, high maintenance cost and the like exist because the cabin is in a high-temperature high-wet-strength corrosion environment for a long time. In view of this, by applying the wireless communication network technology to the cabin communication network, the communication requirement in the cabin can be better satisfied.

Radio frequency communication modes such as WIFI are mostly adopted in the existing wireless communication network arranged in a cabin, however, a place with weak radio frequency signals or even no radio frequency signals exists in the cabin; meanwhile, the wireless communication network deployed in the cabin may be a plurality of different communication systems, and the communication links between each communication system are different, so that the mobile user terminal generates additional data overhead when switching networks, thereby reducing the system performance and affecting the user experience.

Therefore, there is a need for an adaptive seamless handover method and system for hybrid networks to solve the above problems.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a self-adaptive seamless switching method and a self-adaptive seamless switching system for a hybrid network.

The invention provides a self-adaptive seamless switching method for a hybrid network, which comprises the following steps:

acquiring the received signal strength between a user terminal and a wireless access point based on the current position of the user terminal;

inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength;

and after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission.

According to the self-adaptive seamless switching method for the hybrid network provided by the invention, the trained network seamless switching model is obtained through the following steps:

acquiring sample receiving signal strength and sample access point switching information between a user terminal and each wireless access point according to historical communication data of the user terminal at different positions in a hybrid network, wherein the sample access point switching information indicates whether the user terminal is switched with the wireless access points;

and training a reinforcement learning model according to the sample receiving signal strength and the sample access point switching information, and if a preset training condition is met, obtaining a trained network seamless switching model.

According to the self-adaptive seamless switching method for the hybrid network, provided by the invention, the reinforcement learning model is constructed based on an SARSA (lambda) algorithm.

According to the adaptive seamless switching method for the hybrid network provided by the invention, the method for training the reinforcement learning model according to the sample received signal strength and the sample access point switching information, and obtaining the trained network seamless switching model if the reinforcement learning model meets the preset training condition comprises the following steps:

taking the position of the user terminal in the hybrid network as a state, taking the sample access point switching information as an action, taking the sample received signal strength corresponding to the user terminal after any action is executed in any state as an incentive, and constructing and obtaining a reinforcement learning model to be trained through the SARSA (lambda) algorithm with the aim of maximizing the communication rate;

training the reinforcement learning model to be trained through the sample received signal strength and the sample access point switching information, determining the action of the current training round through an epsilon-greedy strategy in each round of training process, updating a table Q and a matrix E after the action execution of the current round is completed, and obtaining a trained network seamless switching model after the training times meet the preset training times.

According to the self-adaptive seamless switching method for the hybrid network, the wireless access point comprises a visible light communication access point and an infrared communication access point.

According to the adaptive seamless handover method for the hybrid network provided by the invention, after the user terminal moves to the next position, the user terminal is handed over to the target wireless access point for data transmission, comprising the following steps:

acquiring the data transmission state of the user terminal at the current moment according to the switching confirmation information sent by the target wireless access point;

sending the data transmission state to the target wireless access point, and acquiring an authorization signal sent by the target wireless access point;

and generating a switching instruction according to the authorization signal, and sending the switching instruction to the user terminal so that the user terminal can be switched to the target wireless access point for data transmission.

The invention also provides a self-adaptive seamless switching system for a hybrid network, which comprises the following components:

the receiving signal strength acquisition module is used for acquiring the receiving signal strength between the user terminal and the wireless access point based on the current position of the user terminal;

a target access point determining module, configured to input the received signal strength into a trained network seamless handover model, and determine a target wireless access point when the user terminal moves to a next location, where the trained network seamless handover model is obtained by training a reinforcement learning model according to sample received signal strength;

and the switching module is used for switching the user terminal to the target wireless access point for data transmission after the user terminal moves to the next position.

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for adaptive seamless handover for a hybrid network as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for adaptive seamless handover for a hybrid network as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method for adaptive seamless handover for a hybrid network as described in any one of the above.

According to the self-adaptive seamless switching method and system for the hybrid network, the optimal access point in the hybrid network is obtained through the reinforcement learning model, and the switching protocol without interrupting data transmission is adopted, so that the extra data overhead in the network selection switching process is avoided, the switching delay is reduced, and the experience quality of a user is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an adaptive seamless handover method for a hybrid network according to the present invention;

FIG. 2 is a schematic diagram of a cabin hybrid VLC-IR communication network scenario provided by the present invention;

FIG. 3 is a schematic diagram of an adaptive seamless vertical handover provided by the present invention;

FIG. 4 is a schematic diagram of a wireless access point handover mechanism based on SARSA (λ) algorithm according to the present invention;

FIG. 5 is a graph illustrating the average reward trend for different algorithms in different ε ranges provided by the present invention;

FIG. 6 is a schematic diagram showing the comparison of average downlink data rates of different algorithms provided by the present invention at different explorations;

FIG. 7 is a schematic diagram showing the comparison of average downlink data rate of the conventional UD handover method and SARSA (lambda) algorithm at different mobile speeds according to the present invention;

fig. 8 is a schematic structural diagram of an adaptive seamless handover system for a hybrid network according to the present invention;

fig. 9 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Compared with radio frequency Communication modes such as WIFI and the like, visible Light Communication (VLC) does not need radio frequency access, and can effectively avoid the problems of electromagnetic interference, frequency spectrum limitation, communication shielding and the like, so that the VLC technology is applied to a cabin wireless Communication network, and Communication is provided in places where radio frequency signals are weak or even no radio frequency signals. Based on the characteristics of the visible light communication channel: the invention further constructs Infrared (IR) communication and VLC communication into a hybrid communication network, and overcomes the defects (such as weak light intensity) in VLC communication because the IR communication and the VLC communication have different wavelengths.

Since the mixed network is deployed in the cabin and the communication links between different network systems are different, when the user terminal is switched to different networks, data transmission is interrupted, which causes extra data overhead in the switching process, so that the complex vertical switching in the heterogeneous environment of the mixed network needs to be optimized, and thus the optimal switching scheme for improving the system performance and reliability is obtained. The invention adopts Reinforcement Learning (RL for short) in artificial intelligence to find the optimal scheme for achieving the maximum reward by a Learning strategy in the interaction process with the environment, so that the user terminal can be switched to the optimal wireless access point in the moving process. In addition, RL does not depend on a large number of data sets, so the application is quite suitable in finding a handover strategy.

It should be noted that, the present invention is described in terms of a hybrid network handover process constructed by VLC wireless access points and IR wireless access points in a cabin, and hybrid networks constructed by other different wireless communication systems may also adopt the adaptive seamless handover method provided by the present invention.

Fig. 1 is a schematic flowchart of an adaptive seamless handover method for a hybrid network according to the present invention, and as shown in fig. 1, the present invention provides an adaptive seamless handover method for a hybrid network, including:

step 101, obtaining the received signal strength between the user terminal and the wireless access point based on the current position of the user terminal.

In the present invention, the wireless access point includes a visible light communication access point and an infrared communication access point. FIG. 2 is a schematic diagram of a hybrid VLC-IR communication network scenario for a cabin according to the present invention, and referring to FIG. 2, in the VLC-IR communication network, VLC communication can effectively avoid the problems of electromagnetic interference, radio frequency limitation, and communication shielding compared with radio frequency communication, thereby providing high-speed, safe and low-energy-consumption communication; further, due to the fact that Infrared (IR) and VLC wavelengths are different, interference on VLC communication links is avoided, and therefore the shortcoming of visible light communication in the cabin is overcome by the fact that infrared communication and VLC communication are complementary to each other and a hybrid communication network inside the cabin is built. Referring to fig. 2, in the VLC-IR hybrid network architecture, the VLC AP and the IR AP cooperate to meet effective coverage of cabin space communication, provide stable and continuous communication for the UD, and a channel of each AP is shared by each UD through a general time division duplex (tdd) manner, and when the UD moves, a corresponding RSS state feedback is generated according to Received Signal Strength (RSS) from different APs, so that the server can obtain the state feedback, thereby deciding a target wireless access point to which the UD is to be handed over. Because the RSS is closely related to the communication rate, the target AP of the UD at the next position is predicted through a trained network seamless switching model according to the acquired RSS between the current position of the UD and the AP. In the invention, because the AP arranged in the cabin has no mobility requirement, a stable and economic cable can be adopted to connect the central coordinator (server) and the AP, thereby controlling signal transmission and data interaction.

And 102, inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength.

In the invention, the random movement of each UD in the cabin hybrid VLC-IR communication network can cause excessive access point switching, which not only can cause unavoidable communication interruption time, but also can seriously reduce the network performance due to extra data overhead, and the traditional switching algorithm can not well solve the problem of frequent switching caused by the random movement of users. Because reinforcement learning has high-quality and reliable performance in a complex dynamic environment decision problem and does not need a data set, the invention realizes a dynamic planning switching mechanism by means of the rule of reinforcement learning in the learning environment under the condition of unknown network information and based on RSS between UD and AP, and obtains an action strategy in the interaction process through interaction between an agent and the environment, so that the reward is maximized, thereby reducing the delay caused by unnecessary switching and improving the network performance and the user experience. Preferably, the invention adopts SARSA (lambda) algorithm to construct a model, so that in a VLC-IR hybrid network, a vertical switching protocol without interrupting data transmission is provided, data transmission can be maintained in the switching process, UD is not sensitive to switching between APs, and the influence of switching on user experience is reduced.

Step 103, after the user terminal moves to the next position, the user terminal is switched to the target wireless access point for data transmission.

In the present invention, fig. 3 is a schematic diagram of the adaptive seamless vertical handover provided by the present invention, and reference is made to fig. 3, which shows a handover procedure between any UD, the central coordinator and the target AP. Firstly, the UD uploads RSS from VLC/IR AP to a central coordinator, then the central coordinator decides a target AP to be switched by the UD through a trained network seamless switching model, namely a switching algorithm, and sends an access control function (switching request) to the target AP; furthermore, after the UD has a definite target AP, switching confirmation and state sending processes are carried out through a switching protocol, at the moment, the UD participates in a negotiation process and keeps data connection with the original AP in the switching process until the switching is successful, the switching process without interruption of data transmission is ensured, and data transmission delay caused by the switching is avoided; and finally, after the UD is switched to the target AP, carrying out data transmission with the target AP and reporting the updating state to the central coordinator.

According to the self-adaptive seamless switching method for the hybrid network, the optimal access point in the hybrid network is obtained through the reinforcement learning model, and the switching protocol without interrupting data transmission is adopted, so that extra data overhead in the network selection switching process is avoided, the switching delay is reduced, and the experience quality of a user is improved.

On the basis of the embodiment, the trained network seamless switching model is obtained through the following steps:

acquiring sample received signal strength and sample access point switching information between a user terminal and each wireless access point according to historical communication data of the user terminal at different positions in a hybrid network, wherein the sample access point switching information indicates whether the user terminal is switched with the wireless access points;

In the invention, the reinforcement learning model is trained through historical communication data, so that the trained model can decide the target AP to be switched by the UD according to the RSS between the UD and the AP. In reinforcement learning, an Agent learns in a "trial and error" manner, through a reward guidance behavior obtained by interacting with an environment, with the goal of maximizing the reward for the Agent, and since the information provided by the external environment is very small, the reinforcement learning model must learn on its own experience, thereby improving the course of action to adapt to the environment.

On the basis of the above embodiment, the reinforcement learning model is constructed based on the SARSA (λ) algorithm.

Compared with the common RL algorithm, the SARSA (lambda) algorithm has the characteristic of memorability, and can more quickly and effectively solve the problem of more complex wireless access point switching. . The SARSA (lambda) algorithm is an advanced version of the SARSA algorithm, the SARSA and Q-learning traditional reinforcement learning algorithms are updated once every step in the environment, but all steps before obtaining the reward are irrelevant to the reward, and the SARSA (lambda) algorithm updates all steps from the starting point to the reward obtaining point, so that the algorithm has stronger memory, more effectively extracts useful information, and better and more quickly obtains the optimal strategy with the maximum profit.

Specifically, the SARSA (λ) algorithm introduces a matrix E of the same size as the Q table to enable the updating of all steps from the start to the reward acquisition, the matrix E being used to record each step of the experience. Since RSS is closely related to communication rate, the UD generates corresponding state feedback according to RSS from different APs during moving, and sends the state feedback to the server. The algorithm selects actions through an epsilon-greedy strategy in a state s (the intelligent agent is prevented from falling into a local optimal value, namely, during updating, the optimal actions are selected according to the probability of 1-epsilon, the actions are randomly selected according to the probability of epsilon, epsilon is an exploration rate), and as the training and decision processes of the model are carried out on the central coordinator, UD only needs to carry out corresponding operations according to the instructions of the central coordinator, the operation burden at the UD is greatly reduced. In addition, the invention only needs to locally measure and collect the received signal strength information of the user from different APs without collecting the position information of UD in the hybrid network.

On the basis of the foregoing embodiment, the training a reinforcement learning model according to the sample received signal strength and the sample access point switching information, and if a preset training condition is satisfied, obtaining a trained network seamless switching model, including:

In the invention, the SARSA (lambda) algorithm has stronger memorability, and compared with the traditional reinforcement learning algorithm, the SARSA (lambda) algorithm can update all steps from the starting point to the reward acquisition, and updates in different degrees according to the distance from the reward acquisition, so that the algorithm can learn the rule in the environment more quickly and effectively, and a switching mechanism with higher system performance is realized. Fig. 4 is a schematic diagram of a wireless access point handover mechanism based on the SARSA (λ) algorithm according to the present invention, and referring to fig. 4, in a cabin hybrid VLC-IR communication network, in the handover mechanism process implemented based on the SARSA (λ) algorithm according to the present invention, a state is where a user is located, an action is whether a user terminal is handed over between APs, and a reward is a feedback after a certain action is executed in a certain state. Because the reinforcement learning is to search the switching strategy with the maximum reward through interaction with the environment, the reward is the feedback after the UD (the UD is used as an intelligent agent to execute the action of server-side decision, namely, the switching of the target access point) executes the action in the environment, and the reward reflects the quality of the strategy to a certain extent. The more steps are awarded when λ increases, the higher the update strength, so in the present invention, λ in the SARSA (λ) algorithm is set to 1.

Further, as can be seen with reference to FIG. 4, s is the current state, a is the action in the current state s, r is the reward after taking the action a in the current state s, and s' is the next state after taking the action a in the current state s, so the algorithm is abbreviated by the acronym "Sarsa". Further, α ∈ [0,1] is the learning rate, γ is the discount factor, and the matrix E is a matrix of the same size as the Q table (storing Q values corresponding to all state-action pair combinations) and is used to store each step in the path. At the beginning of each round of training, the matrix E is initialized to 0.

In the training process or the actual switching process, in the current state s, namely when UD is at the current position, the optimal switching action is selected according to the probability of 1-epsilon, the switching action is randomly selected according to the probability of epsilon, and when the switching action is selected each time, the corresponding E (s, a) value in the matrix E is plus 1. At the same time, when action a is performed and a reward r (i.e. an RSS value from a different AP on UD) is obtained, Q is obtained _target (s) calculating the error between the ideal value and the actual value, delta = Q _eval (s)-Q _target (s) and passing Q _eval The Q table is updated in a manner of(s) + α δ E (s, a).

Further, after each step, the matrix E is updated in a manner of λ × γ × E (s, a), and the current state s is updated to s', and the above process is repeated until the end of the current round. And obtaining a trained model after the training times meet the preset training times, and outputting a decision by the model to obtain a target access point after the cyclic process of the process reaches the preset round times in the actual wireless access point switching.

On the basis of the foregoing embodiment, the handing over the user terminal to the target wireless access point for data transmission after the user terminal moves to the next location includes:

In the present invention, as shown in fig. 3, a handover process between any UD and a target AP under the coordination control of the central coordinator is specifically as follows:

in step 201, the UD periodically receives the RSS state from the VLC AP or IR AP, and then the UD reports the RSS state from the different APs to the central coordinator;

step 202, after obtaining RSS states from different APs uploaded by the UD, the central coordinator runs an AP selection algorithm (i.e., through a trained network seamless handover model) to determine a target AP to which the UD needs to be handed over;

step 203, the central coordinator sends switching request information to the target AP, and then obtains switching confirmation information of the target AP, at this time, the UD determines the target AP to be switched to;

step 204, the central coordinator sends a state request to the UD, further synchronizes the current data transmission state of the UD, and forwards the state to the target AP;

step 205, after receiving the UD data transmission state, the target AP sends an access authorization signal to the central coordinator, so as to allow the UD access;

step 206, the central coordinator sends an instruction for switching to the target AP to the UD, and the UD executes and accesses the target AP after receiving the instruction, so as to perform data transmission with the target AP according to the transmission state information;

finally, the UD reports the updated state information to the central coordinator, step 207.

Fig. 5 is a schematic diagram of average reward trends of different algorithms in different epsilon ranges provided by the present invention, which can be referred to in fig. 5, and the average rewards (average rewarded) of different algorithms are compared under different exploration rates epsilon, where values of epsilon in (a), (b), (c), and (d) of fig. 5 are 0.08, 0.06, 0.04, and 0.02, respectively. In the epsilon-greedy method, for different epsilon, the average rewards corresponding to different algorithms show an obvious convergence trend along with the increase of the number of training rounds in the oscillation. The SARSA (lambda) algorithm converges faster than the Q-learning and Sarsa algorithms at different values of epsilon. Furthermore, in Q-learning and Sarsa, when ε is too small (e.g., 0.02), the algorithm easily falls into a local pseudo-optimal state, and thus the rate of convergence to the extremum is significantly reduced. Therefore, this convergence requires more iterations to reach the true final peak. However, since the SARSA (λ) algorithm has strong empirical state memory, it is not affected by ε. It should be noted that the smaller epsilon, the better the convergence of the different algorithms, and the larger the extremum of the average reward, since the reduction of epsilon reduces the negative impact of randomness.

The switching mechanism provided by the invention not only solves the limitation of the traditional switching method under the condition of frequent switching, but also can realize higher downlink transmission rate compared with the traditional reinforcement learning algorithm. Fig. 6 is a schematic diagram showing a comparison of average downlink data rates of different algorithms provided by the present invention under different exploratory rates epsilon, and as shown in fig. 6, under different epsilon values, the SARSA (lambda) algorithm has a higher average downlink data rate, which is 13% and 14% higher than Q-learning and SARSA, respectively.

Due to the randomness of user mobility, using conventional handover methods cannot intelligently adapt to random changes in the environment, nor can the performance degradation caused by system overhead be mitigated. Fig. 7 is a schematic diagram showing a comparison between average downlink data rates of the conventional UD switching method and the SARSA (λ) algorithm at different moving speeds, as shown in fig. 7, the average downlink data rates of the SARSA (λ) algorithm, the IHO algorithm, and the DHO algorithm at different speeds are 57% higher than the average downlink data rate of the IHO algorithm at different speeds and 22% higher than the average downlink data rate of the DHO algorithm at different speeds. In addition, the SARSA (λ) algorithm is less affected by UD moving speed and maintains a higher average downstream data rate.

The following describes an adaptive seamless handover system for a hybrid network according to the present invention, and the adaptive seamless handover system for a hybrid network described below and the adaptive seamless handover method for a hybrid network described above may be referred to in correspondence with each other.

Fig. 8 is a schematic structural diagram of an adaptive seamless handover system for a hybrid network, as shown in fig. 8, the present invention provides an adaptive seamless handover system for a hybrid network, including a received signal strength acquisition module 801, a target access point determination module 802, and a handover module 803, where the received signal strength acquisition module 801 is configured to acquire, based on a current location of a user terminal, a received signal strength between the user terminal and a wireless access point; the target access point determining module 802 is configured to input the received signal strength into a trained network seamless handover model, and determine a target wireless access point when the user terminal moves to a next location, where the trained network seamless handover model is obtained by training a reinforcement learning model according to a sample received signal strength; the handover module 803 is configured to handover the user terminal to the target wireless access point for data transmission after the user terminal moves to a next location.

According to the self-adaptive seamless switching system for the hybrid network, the optimal access point in the hybrid network is obtained through the reinforcement learning model, and the switching protocol without interrupting data transmission is adopted, so that the extra data overhead in the network selection switching process is avoided, the switching delay is reduced, and the experience quality of a user is improved.

The system provided by the present invention is used for executing the above method embodiments, and for the specific processes and details, reference is made to the above embodiments, which are not described herein again.

Fig. 9 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 9, the electronic device may include: a processor (processor) 901, a communication interface (communication interface) 902, a memory (memory) 903 and a communication bus 904, wherein the processor 901, the communication interface 902 and the memory 903 are communicated with each other through the communication bus 904. The processor 901 may invoke logic instructions in the memory 903 to perform an adaptive seamless handover method for a hybrid network, the method comprising: acquiring the received signal strength between a user terminal and a wireless access point based on the current position of the user terminal; inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength; and after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission.

In addition, the logic instructions in the memory 903 may be implemented in a software functional unit and stored in a computer readable storage medium when the logic instructions are sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the adaptive seamless handover method for a hybrid network provided by the above methods, the method comprising: acquiring the received signal strength between a user terminal and a wireless access point based on the current position of the user terminal; inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength; and after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for adaptive seamless handover for a hybrid network provided in the above embodiments, the method comprising: acquiring the received signal strength between a user terminal and a wireless access point based on the current position of the user terminal; inputting the received signal strength into a trained network seamless switching model, and determining a target wireless access point when the user terminal moves to the next position, wherein the trained network seamless switching model is obtained by training a reinforcement learning model according to the sample received signal strength; and after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An adaptive seamless handover method for a hybrid network, comprising:

after the user terminal moves to the next position, switching the user terminal to the target wireless access point for data transmission;

the trained network seamless switching model is obtained through the following steps:

training a reinforcement learning model according to the sample receiving signal intensity and the sample access point switching information, and if preset training conditions are met, obtaining a trained network seamless switching model;

the reinforcement learning model is constructed based on an SARSA (lambda) algorithm;

the training of the reinforcement learning model according to the strength of the sample received signal and the switching information of the sample access point, and if the preset training condition is met, obtaining a trained network seamless switching model, include:

2. The method of claim 1, wherein the wireless access points comprise a visible light communication access point and an infrared communication access point.

3. The method of claim 1, wherein the handing over the user terminal to the target wireless access point for data transmission after the user terminal moves to a next location comprises:

4. An adaptive seamless handover system for a hybrid network, comprising:

a switching module, configured to switch the user terminal to the target wireless access point for data transmission after the user terminal moves to a next location;

the training of the reinforcement learning model according to the sample received signal strength and the sample access point switching information, and if the preset training condition is met, obtaining a trained network seamless switching model, includes:

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for adaptive seamless handover for a hybrid network according to any of claims 1 to 3.

6. A non-transitory computer readable storage medium, having stored thereon a computer program, when being executed by a processor, for implementing the steps of the method for adaptive seamless handover for a hybrid network according to any of claims 1 to 3.