CN110009061B

CN110009061B - AP self-adaptive optimization selection method based on machine learning

Info

Publication number: CN110009061B
Application number: CN201910314113.1A
Authority: CN
Inventors: 赵海涛; 李嘉欣; 于建国; 张唐伟; 张晖; 朱洪波
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2022-08-05
Anticipated expiration: 2039-04-18
Also published as: CN110009061A; WO2020211833A1

Abstract

The invention discloses an AP (access point) self-adaptive optimization selection method based on machine learning, which is applied to the process of establishing WIFI (wireless fidelity) connection between mobile equipment and an AP (access point) and switching a self-adaptive network of an internet of vehicles, and comprises the following steps: collecting data of connected equipment in the current environment, establishing a training data set and a feature set, and determining a threshold; confirming whether the unijunction tree is detected according to the data set and the ID3 algorithm; if the non-unijunction tree is not the unijunction tree, the subsets are divided to construct sub-node spanning trees; recursively calling until a complete decision tree is generated to classify the APs into a FAST set and a SLOW set, and selecting the fastest AP in the FAST set to establish connection; according to the method and the device, the AP access point is selected according to the machine learning model so as to shorten the connection time and reduce the WIFI connection setting time cost.

Description

AP self-adaptive optimization selection method based on machine learning

Technical Field

The invention relates to the field of communication technology and Internet of vehicles self-adaptive switching, in particular to an AP self-adaptive optimization selection method based on machine learning.

Background

In recent years, wireless data traffic has been on an exponential trend due to the explosive growth of smart devices. Among these wireless networks, 802.11 wireless local area network (WiFi) has become an integral part of today's wireless services. Over the past decade, over 10 million wifi aps (access points) have been deployed to provide wireless connectivity. Even if the user uses a smart device supporting a 3G/4G cellular network, WiFi hotspots that are ubiquitous today can be used.

However, network performance and user experience in WiFi networks are still not satisfactory: according to a survey study of over 500 million users using WiFi networks in urban areas, as many as 45% of mobile devices cannot establish WiFi connections with corresponding APs, with 15% (5%) of successful WiFi connections costing more than 5 seconds (10 seconds) of connection establishment time. Previous measurement studies on WiFi networks have focused on general user experience metrics (e.g., bandwidth and delay experienced in WiFi networks), with little focus on the performance of the WiFi connection establishment process. Data is collected from android smartphones in a controlled environment, and it is found that the loss of a large amount of connection setup time cost is mainly caused by the loss of DHCP packets. In fact, the performance of the outdoor WiFi connection establishment process is still unknown and a larger scale thorough investigation is lacking. There are many studies focusing on WiFi performance measurements, aiming to estimate the available throughput of certain AP-Client links and to explore the delay in AP. However, based on current research, it is imperative to note the connection setup time cost metric, since a high connection failure rate has affected the user experience.

Meanwhile, in the field of internet of vehicles, there is a problem of network connection switching at the same time, in the prior art, when a mobile terminal device moves in a network, the mobile device will be connected to various APs, resulting in significant fluctuation of service quality and possible long connection interruption during switching, where signal power is insufficient to support data rate: these typically include various living scenarios such as elevators and stairways, particularly in the field of car networking. When the user reaches the network blind spot, the connection is interrupted. It is clear that the data flow on the mobile terminal device will be particularly affected by temporary connection losses and is seen as a major problem by the user.

Currently, there is little research on connection establishment and adaptive handover procedures. Most current research related to the connection establishment procedure is about WiFi handover mechanisms: i.e. to reduce handover delay. Various solutions have emerged in the prior art to alleviate these problems, to receive information about an upcoming connection loss, such as a loss prediction of the device itself or an appropriate intervention on the player. A key element of all contemplated strategies is long-term prediction of appropriate channel conditions, the time scale of which is much larger than small-scale fading. For prediction, most existing methods are based on specific channel models or extensive and detailed channel maps. Clearly, both of these approaches may not be sufficient to ensure data flow requirements.

Disclosure of Invention

The invention mainly aims to solve the problems in the prior art and provides an AP self-adaptive selection optimization method based on machine learning, and the specific technical scheme is as follows:

a method for AP adaptive optimization selection based on machine learning, the method comprising the steps of:

step 1, collecting data of connecting equipment in the current environment, establishing a training data set and a feature set, and determining a threshold;

step 2, confirming whether the unijunction tree is a unijunction tree or not according to the data set and an ID3 algorithm;

is provided with k classes C _k (K-1, 2,3, K), if all instances in the training dataset belong to the same class C _k If the decision tree is a single node tree, class C is defined as _k Returning to the decision tree as the class mark of the node; if the characteristic set is an empty set, the decision tree is a single node tree, and the class C with the largest number of instances in the training data set _k Returning to the decision tree as the class mark of the node; otherwise, calculating the information gain of each feature in the feature set to the training data set according to the algorithm ID3, and selecting the feature Ag with the maximum information gain; if the information gain of Ag is smaller than the threshold epsilon, the decision tree is set as a single node tree, and the class C with the largest number of instances in the training data set is set _k Returning to the decision tree as the class mark of the node;

step 3, if the tree is not a unijunction tree, the subsets are divided to construct a sub-node spanning tree;

and 4, recursively calling the S2 and S3 until a complete decision tree is generated, so as to classify the APs into a fast set and a slow set, and selecting the AP with the fastest fast set to establish connection.

Further, in step 1, the established feature set includes, but is not limited to, time when connecting, signal strength, mobile device model, whether it is a public AP, whether it is encrypted, number of connected devices, and the algorithm used includes, but is not limited to, decision tree algorithms such as ID3, C4.5, and the like.

Further, in step 2, the step of calculating the information gain by the ID3 algorithm is as follows:

step 2-1, calculating the empirical entropy H (D) of the data set D;

step 2-2, calculating the empirical condition entropy H (D | A) of the feature A to the data set D;

step 2-3, calculating information gain g (D, A);

g(D,A)＝H(D)-H(D|A)

further, in the step 3, specifically, after determining whether the unijunction tree is a unijunction tree, for each possible value ai of Ag, the training data set is divided into a plurality of non-empty subsets Di according to Ag ═ ai, the class with the largest number of instances in Di is used as a label, sub-nodes are constructed, a decision tree is constructed by the nodes and the sub-nodes, and the decision tree is returned.

Compared with the prior art, the connection attempt failure frequency is less than 3.6% in the application process of the invention. The 80% time cost is only 3 seconds compared to more than 30 seconds using the baseline algorithm, i.e. a 10 times reduction in 80% connection time cost. The algorithm of the present invention takes into account that the probability of a connection failure event is high, even though the measured signal strength is highest on the mobile device. The model of the present invention can predict these connection failure events with higher accuracy and avoid mobile devices connecting to the SLOW set AP.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a graph of the relative information gain of an example feature set.

FIG. 3 is a decision tree model generated by an embodiment.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the drawings in the specification.

step 1, collecting data of connected equipment in the current environment, establishing a training data set and a feature set, and determining a threshold value.

In step 1, the established feature set includes, but is not limited to, time when connecting, signal strength, mobile device model, whether it is a public AP, whether it is encrypted, and the number of connected devices, and the usage algorithm includes, but is not limited to, decision tree algorithms such as ID3 and C4.5.

Specifically, input: a training data set D, a feature set A (time, signal strength, mobile device model, whether the AP is a public AP or not, whether encryption is performed or not, the number of connected devices and the like) and a threshold value epsilon; and (3) outputting: and (4) a decision tree T. To understand how each function affects the cost of connection time when building a data set, the present invention uses axis visualization to display the difference in cost of connection time for each function. Embodiments omit the coordinate axis visualization of functional mobile device models and AP models, as there are thousands of different mobile device models and AP models. In an embodiment, the relative information gain of some features is typically selected, see fig. 2.

And 2, confirming whether the single junction tree is a unijunction tree or not according to the data set and an ID3 algorithm.

Is provided with k classes C _k (K-1, 2,3, K), if all instances in the training dataset belong to the same class C _k If the decision tree is a single junction tree, class C will be processed _k Returning to the decision tree as the class mark of the node; if the characteristic set is an empty set, the decision tree is a single node tree, and the class C with the largest number of instances in the training data set _k Returning to the decision tree as the class mark of the node; otherwise, calculating the information gain of each feature in the feature set to the training data set according to the algorithm ID3, and selecting the feature Ag with the maximum information gain; if the information gain of Ag is smaller than the threshold epsilon, the decision tree is set as a single node tree, and the class C with the largest number of instances in the training data set is set _k And returning to the decision tree as the class mark of the node.

In step 2, the step of calculating the information gain by the ID3 algorithm is as follows:

step 2-1, calculating the empirical entropy H (D) of the data set D;

step 2-3, calculating information gain g (D, A);

g(D,A)＝H(D)-H(D|A)

and 3, if the tree is not a unijunction tree, partitioning the subsets to construct a sub-node spanning tree.

In the step 3, specifically, after determining whether the unijunction tree is a unijunction tree, for each possible value ai of Ag, the training data set is divided into a plurality of non-empty subsets Di according to Ag ═ ai, the class with the largest number of instances in Di is used as a label, sub-nodes are constructed, a decision tree is formed by the nodes and the sub-nodes, and the decision tree is returned.

And for the ith sub-node, taking Di as a training set and A- { Ag } as a feature set, recursively calling to obtain a sub-tree Ti, and returning to Ti. Example decision tree models generated are described in reference to fig. 3.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. An AP self-adaptive optimization selection method based on machine learning is characterized in that: the method comprises the following steps:

is provided with k classes C _k If all instances in the training data set belong to the same class C _k If the decision tree is a single junction tree, class C will be processed _k Returning to the decision tree as the class mark of the node; if the characteristic set is an empty set, the decision tree is a single node tree, and the class C with the largest number of instances in the training data set _k Returning to the decision tree as the class mark of the node; otherwise, calculating the information gain of each feature in the feature set to the training data set according to the algorithm ID3, and selecting the feature Ag with the maximum information gain; if the information gain of Ag is smaller than the threshold epsilon, the decision tree is set as a single node tree, and the class C with the largest number of instances in the training data set is set _k Returning to the decision tree as the class mark of the node;

in the step 1, the established feature set includes time, signal strength, mobile device model, whether the mobile device is a public AP, whether the mobile device is encrypted, and the number of connected devices;

specifically, input: training a data set D, wherein a feature set A comprises time, signal strength, mobile equipment model, whether public AP exists, whether encryption exists, the number of connected equipment and a threshold epsilon; and (3) outputting: a decision tree T; when a data set is established, how each function affects the cost of connection time is known, and the difference of the cost of connection time of each function is displayed by using the visualization of coordinate axes;

in the step 3, specifically, after determining whether the unijunction tree is formed, for each possible value ai of Ag, the training data set is divided into a plurality of non-empty subsets Di according to Ag ═ ai, the class with the largest number of instances in Di is used as a mark, sub-nodes are constructed, a decision tree is formed by the nodes and the sub-nodes, and the decision tree is returned;

step 4, recursively calling the steps 2 and 3 until a complete decision tree is generated, so as to classify the APs into a fast set and a slow set, and selecting the fastest AP of the fast set to establish connection; and for the ith sub-node, taking Di as a training set and A- { Ag } as a feature set, recursively calling to obtain a sub-tree Ti, and returning to Ti.

2. The method of claim 1, wherein the method comprises: in step 1, the established feature set includes, but is not limited to, time when connecting, signal strength, mobile device model, whether it is a public AP, whether it is encrypted, and the number of connected devices, and the usage algorithm includes, but is not limited to, ID3 and C4.5 decision tree algorithm.

3. The AP self-adaptive optimization selection method based on machine learning according to claim 1, characterized in that: in step 2, the step of calculating the information gain by the ID3 algorithm is as follows:

step 2-1, calculating the empirical entropy H (D) of the data set D;

step 2-3, calculating information gain g (D, A);

g(D,A)＝H(D)-H(D|A)。

4. the method of claim 1, wherein the method comprises: in the step 3, specifically, after determining whether the unijunction tree is a unijunction tree, for each possible value ai of Ag, the training data set is divided into a plurality of non-empty subsets Di according to Ag ═ ai, the class with the largest number of instances in Di is used as a label, sub-nodes are constructed, a decision tree is formed by the nodes and the sub-nodes, and the decision tree is returned.