CN101634995A - Network connection speed predicting method based on machine learning - Google Patents

Network connection speed predicting method based on machine learning Download PDF

Info

Publication number
CN101634995A
CN101634995A CN200910102126A CN200910102126A CN101634995A CN 101634995 A CN101634995 A CN 101634995A CN 200910102126 A CN200910102126 A CN 200910102126A CN 200910102126 A CN200910102126 A CN 200910102126A CN 101634995 A CN101634995 A CN 101634995A
Authority
CN
China
Prior art keywords
user
website
neural network
network
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910102126A
Other languages
Chinese (zh)
Other versions
CN101634995B (en
Inventor
徐颂华
江浩
金涛
刘智满
潘云鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2009101021269A priority Critical patent/CN101634995B/en
Publication of CN101634995A publication Critical patent/CN101634995A/en
Application granted granted Critical
Publication of CN101634995B publication Critical patent/CN101634995B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network connection speed predicting method based on machine learning. The method comprises the following steps: (1) recording the speed of connecting a user with a browsed website by a customized browser and taking the speed as a training set and a test set; (2) using neural network training and predicting the speed of connecting the user with all websites in the training set by the obtained website connecting speed; (3) executing the step (4) or dividing the training set into smaller training sets and executing the step (2) on all the training sets according to the condition of the predicted error reduction of all neural networks; (4) testing the predicting performance of the neural networks by a decision tree; and (5) predicting the speed of connecting the user with an unknown website by the decision tree and the neural networks. The invention predicts the speed of connecting the user with all the websites by an artificial intelligence technology and a machine learning method, improves the precision of network condition assessment, and fully uses the user bandwidth to provide better internet experience for the user.

Description

A kind of network connection speed predicting method based on machine learning
Technical field
The present invention relates to computer search and web technology field, relate in particular to a kind of network connection speed predicting method based on machine learning.
Background technology
In recent years, a series of research activities has appearred, studying personalized or user oriented search engine and algorithm, as be published in 2007 the 16 international web-seminar (WWW ' 07:Proceedings ofthe 16 ThInternational conference on World Wide Web) the one piece of article " the extensive evaluation and the analysis of personalized search strategy " on (" A large-scale evaluation and analysis of personalizedsearch strategies ").One piece of article in the 23 U.S. artificial intelligence association in 2008 meeting " based on the user oriented webpage sort algorithm of user concerned time " (" A user-oriented webpage rankingalgorithm based on user attention time ") lining, the author also proposes to set up the personalized solution of a user oriented web page search engine.The present invention is the network connection situation that is used for optimizing specially the personal user.In the present invention, we have studied user oriented optimum network to greatest extent and connect and select, and this is seldom related in the past research and invention work.
Because service quality is very crucial in the network insertion of web browser and many other types, any method that can improve service quality all has huge commercial value.Some solutions have the people to propose, and some has then dropped into commercial the use.In these solutions, the most successful large scale business software is to utilize simple idea, can open a plurality of linked network content supplier automatically and do parallel the download or visit.An example is a software (http://www.xunlei.com/) that is called a sudden peal of thunder, and this is one of most popular Chinese software.Yet, use this class method, web page contents supplier's website will be subjected to tremendous influence, because it is to visit webpage by auto-programming, rather than the final user, therefore, online advertisement will lose their value on these webpages.The enterprise that this problem has caused providing this type of service quality to improve serves and some the law court's cases between the web site contents supplier.In the present invention, we have proposed a kind ofly to come the predictive user network condition based on data mining method, can be used to set up the significant consideration of personalized web site commending system, thereby provide best service quality for the personal user.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of network connection speed predicting method based on machine learning is provided.
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Description of drawings
Fig. 1 is based on the embodiment flowage structure figure of the network connection speed predicting method of machine learning;
Fig. 2 is the process flow diagram that the present invention is applied to personalized network resource recommended system;
Fig. 3 is the virtual network architecture synoptic diagram that uses in the virtual network experiment;
Fig. 4 is the predicated error synoptic diagram under the neural network number situation in limiting the artificial neural network group; Horizontal ordinate is the number of neural network among the artificial neural network group, the predicated error when ordinate is predicted the training intensive data for using this neural network group and decision tree to be used to; Article three, curve has represented that respectively training set has 10000,50000, and the predicated error situation during 100000 data;
Fig. 5 is a schematic diagram data of testing acquisition in the virtual network of as shown in Figure 3 simulation internet situation; (a)-(f) distinguish the data in the corresponding tables 1 (a)-(f), and draw with the form of performance boost percentage; In each figure, horizontal ordinate is the number of neural network among the artificial neural network group, ordinate is used to predict the connection speed of unknown website for using this neural network group and decision tree, thereby after being used for network resource recommended system, the number percent that the data speed of download of acquisition promotes; In the experiment of (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms);
Fig. 6 is that the embodiment of the invention is used the experimental data figure under real China Internet; (a)-(c) shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, horizontal ordinate is #1~#20 numbering of 20 users of participation experiment, and ordinate is a user's download network resource data consumed time; When each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention and the experimental data of sudden peal of thunder software when the single site downloading mode under the similarity condition; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Embodiment
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Embodiment
As shown in Figure 1, this method comprises training stage and forecast period two parts; Training stage comprises user's historical data 10, training set 20, and test set 30, artificial neural network 40, error judges 50, cuts apart training set 60, artificial neural network group 70, C4.5 decision tree 80; Forecast period comprises unknown website 90, C4.5 decision tree 80, artificial neural network group 70, connection speed predicted value 99.
User's historical data 10: the data of the user bandwidth of the user's tie-time during each access site during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Training set 20: select 10% at random as behind the test set from user's historical data 10, remaining 90% part is as training set;
Test set 30: from user's historical data, select 10% at random as test set;
Artificial neural network 40: in an embodiment, we have used one 4 layers artificial neural network, and wherein input layer is the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; Its output layer is 2 real numbers, tie-time estimated value and bandwidth estimation value between expression user and this website; Each neuron in other two-layer in this neural network is a sigmod function, and per two neurons between adjacent two layers all link to each other; Utilize back-propagation algorithm (back-propagation), constantly use user's historical data 10 that it is trained on the backstage.
Error judges 50: use the user's tie-time and the user bandwidth of artificial neural network 40 predictions each website in training set 20 after training, calculate the predicted value of this website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient; If this step is not to be performed for the first time and the predicated error summation of this moment differs with last predicated error summation and is no more than 3%, then finish training to neural network, preserve at this moment all neural networks, obtain artificial neural network group 70.
Cut apart training set 60: the station data in the training set 20 is arranged from small to large by its predicated error in artificial neural network 40, and using the contiguous clustering algorithm of k that website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group; To each group in the m group, with it again as training set 20 and jump to artificial neural network 40 places and carry out.
Artificial neural network group 70: judge in 50 in error, if everyone artificial neural networks differs with the last predicated error summation of cutting apart before the training set 60 the predicated error summation of all training datas and is no more than 3%, everyone artificial neural networks of this moment is the artificial neural network group; Fig. 4 has shown the predicated error under the neural network number situation in limiting the artificial neural network group, this shows the necessity of setting up the neural network group and cutting apart training set.
C4.5 decision tree 80: all websites in the test set 30 are divided into 1000 groups by its network ip address, are numbered between 1~1000; If total n artificial neural network among the artificial neural network group 70, the neural network that each website is used to train in artificial neural network group 70 at last in the record training set 20 is numbered, and is numbered between 1~n; The value that is input as of this decision tree is 1~1000 a network ip address group #, and being output as value is the neural network numbering of 1~n; Utilize test set 30, use the C4.5 decision Tree algorithms to train this decision tree, preserve the decision tree after training;
Unknown website 90: the website of its connection speed of the unknown on the internet;
Connection speed predicted value 99: to unknown website 90, obtain its value and be the network ip address numbering between 1~1000, and use C4.5 decision tree 80 to obtain its corresponding neural network numbering, use its neural network corresponding in artificial neural network group 70 then, predict the user's tie-time and the user bandwidth of this website that obtains.
An important application of the present invention: the flowage structure of the network resource recommended system of propertyization as shown in Figure 2 one by one.You and preceding you two parts before this personalized recommendation system comprises, background end comprises custom browser 100, resource recommendation result 700; Preceding you comprises user's historical data 200, based on the network connection speed prediction 300 of machine learning, general search engine 400, basic search result 500, the merger of Search Results and adjustment 600.
Custom browser 100: by the form of plug-in unit, at existing Internet resources browser such as Firefox, but the module of the user bandwidth of the user's tie-time when embedding the each access site of recording user among the Internet Explorer during with the transmission data.
User's historical data 200: the data of the user bandwidth of the user's tie-time during each access site of obtaining by custom browser 100 during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Network connection speed prediction 300 based on machine learning: use a kind of network connection speed predicting method of the present invention, each website among the basic search result is predicted its user tie-time and user bandwidth based on machine learning.
General search engine 400 a: user interface is provided, calls the Internet resources search service; In the present embodiment, this interface is used and is realized with jsp; When the user submits a query requests to, call general network search engine (such as Google) and obtain Search Results.
Basic search result 500: after utilizing general search engine 400 to search for, preceding 100 results in its return results are resolved and obtained to its result of page searching, and, then the document is downloaded and deposited to this locality if this resource is a document.
The merger of Search Results and adjustment 600:1) merger: if the Internet resources that the user needs are text, then, use " text similarity estimation source code " (" the Code for estimating document similarity ") in the open source code package of Microsoft (Microsoft) to calculate the text similarity between them to per two in the Search Results; If its similarity is greater than 95%, then these two of marks are identical content; If the Internet resources of user's needs are other forms, then to per two in the Search Results, 10 offset location of picked at random, the data of comparison 1K byte length on each offset location in this data file of two; If this data file of two is identical in the data of all 10 positions, then they is labeled as and has identical content; The website that then all is had identical content all is integrated in the middle of the search result items that this content the most preceding occurs, is combined into a search result items; 2) adjust: if the network resource data size that the user needs is less than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate the user resequence from small to large the tie-time; If the network resource data size that the user needs is greater than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate user bandwidth resequence from big to small;
Resource recommendation result 700: obtain user oriented personalized resource recommendation result after the merger of process Search Results and the process of adjustment 600; This recommendation results has fully taken into account personal network's situation of user, makes the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The experimental result of table 1~2 demonstrates the superiority of this method clearly;
Table 1 is data of testing acquisition in the virtual network of a simulation internet situation; The network structure of this virtual network as shown in Figure 3; Each website is under the multitiered network structure that is formed by some gateway tissues; Total about 30000 computing machines in this virtual network, be distributed in three different Internet service providers (ISP) under; Gateways at different levels if more near the network root then its time delay more little and bandwidth near netting twine is big more, if more away from the network root then its time delay big more and bandwidth near netting twine is more little; Be about 1/100 of inner each gateway delay time of same ISP the time delay of the main line gateway between the different I SP, bandwidth is about 50 times; In our experiment, we have set up 500 different resource data, and each duplicates 2000 parts, are randomly dispersed in the computing machine in the virtual network; User's resource query request each time supposes that search engine can return wherein 90% website, and random alignment; The probability that the i item in the site list is returned in our suppose user clicks search is
Figure G2009101021269D00091
In table 1, listed and used before and after the network resource recommended system that has the method for the invention embodiment, the user obtains the spended time altogether of its resource requirement; Each row represents that respectively user's resource requirement is of a size of the 10K byte in each experiment, obtains the required time of this resource when 1M byte and 100M byte; Each row is not when the network resource recommended system that has the embodiment of the invention is used in expression respectively, use has the network resource recommended system of the embodiment of the invention and limits and has 1 among the artificial neural network group, and 5,10,50, the experimental data during 100 neural networks; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms); In the experiment of table 1 (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of table 1 (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; All data of table 1 (a)-(f) all are presented among each figure of Fig. 5 (a)-(f) with the graph mode correspondence with the data mode of performance boost number percent.
Table 1
(a) analog dialup network, 10000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??285.3ms ??23472ms ??2290176ms
?1 ??233.7ms ??17205ms ??1662668ms
?5 ??117.8ms ??10445ms ??846298ms
?10 ??97.0ms ??8098ms ??785530ms
?50 ??87.1ms ??6957ms ??695418ms
?100 ??88.3ms ??6329ms ??718152ms
(b) analog dialup network, 50000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??296.6ms ??23631ms ??2394101ms
?1 ??238.8ms ??18149ms ??1941615ms
?5 ??118.6ms ??8791ms ??945567ms
?10 ??77.1ms ??6940ms ??825965ms
?50 ??76.3ms ??5334ms ??631479ms
?100 ??73.3ms ??5404ms ??598086ms
(c) analog dialup network, 100000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??269.8ms ??22255ms ??2250904ms
?1 ??195.6ms ??17381ms ??1609396ms
?5 ??80.1ms ??7077ms ??841838ms
?10 ??54.8ms ??4874ms ??567228ms
?50 ??41.5ms ??4015ms ??457065ms
?100 ??41.5ms ??3387ms ??338572ms
(d) simulation broadband network, 10000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??135.8ms ??3808ms ??486680ms
?1 ??133.1ms ??4116ms ??476460ms
?5 ??123.1ms ??3610ms ??435579ms
?10 ??124.5ms ??3397ms ??346516ms
??50 ??102.5ms ??3046ms ??305148ms
??100 ??98.9ms ??2871ms ??279206ms
(e) simulation broadband network, 50000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??140.2ms ??4597ms ??369387ms
?1 ??142.4ms ??4951ms ??361999ms
?5 ??125.9ms ??4275ms ??340575ms
?10 ??108.7ms ??3480ms ??271869ms
?50 ??106.1ms ??3273ms ??253768ms
?100 ??103.5ms ??3167ms ??249336ms
(f) simulation broadband network, 100000 user's historical records
The neural network number The 10K byte The 1M byte The 100M byte
Not 0 (not using the present invention) ??175.8ms ??8012ms ??494828ms
?1 ??180.9ms ??7403ms ??510168ms
?5 ??152.2ms ??6794ms ??422088ms
?10 ??114.8ms ??3332ms ??220693ms
?50 ??101.8ms ??2660ms ??196942ms
?100 ??97.2ms ??2732ms ??191004ms
Table 2 is experimental datas that the present invention compares with sudden peal of thunder software under the virtual network situation; Table 2 (a)-(c) has shown that respectively qualification user historical data is 10000,50000, and the experimental data in the time of 100000; In each table, each row respectively in the each experiment of expression user's resource requirement data size size be the 10K byte, the 1M byte obtains the required time of this resource during the 100M byte; The expression when not using embodiment of the invention system and use experimental data after the network resource recommended system that has the embodiment of the invention respectively of each row; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (simulating its single site downloading mode) is also listed in table as a comparison; Each experimental data is the similar resource average data after 100 times repeatedly; Experimental data unit is millisecond (ms).
Table 2
(a) 10000 user's historical datas
The 10K byte The 1M byte The 100M byte
Do not use the present invention ??285.3ms ??23472ms ??2290176ms
Use the present invention ??88.3ms ??6329ms ??718152ms
Use a sudden peal of thunder ??254.4ms ??19603ms ??1951160ms
(b) 50000 user's historical datas
The 10K byte The 1M byte The 100M byte
Do not use the present invention ??296.6ms ??23631ms ??2394101ms
Use the present invention ??73.3ms ??5404ms ??590086ms
Use a sudden peal of thunder ??253.6ms ??18858ms ??1855428ms
(c) 100000 user's historical datas
The 10K byte The 1M byte The 100M byte
Do not use the present invention ??269.8ms ??22255ms ??2250904ms
Use the present invention ??41.5ms ??3387ms ??338572ms
Use a sudden peal of thunder ??204.8ms ??15710ms ??1482256ms
Fig. 6 is that the embodiment of the invention is used the experimental data under real China Internet; In the experiment of Fig. 6, ((user #11~#20) has used the network resource recommended system that has the embodiment of the invention for user #1~#10) and 10 broadband network users from different regions for 10 Dial-up Network users from different regions; After using fortnight, to the exemplary resource on 3 kinds of internets: text, PDF document and online game installation file conduct interviews; Fig. 6 (a)-(c) has shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, when each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (being defined as the single site downloading mode) is also listed as a comparison with the column figure; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Above-mentioned experiment shows, the present invention has effectively utilized user's web-based history Visitor Logs, the method of using artificial intelligence has been predicted the connection speed between user and each website, personal network's situation of user has been combined in the access to netwoks process, make the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The above only is the preferred embodiment of a kind of network connection speed predicting method based on machine learning of the present invention, is not in order to limit the scope of essence technology contents of the present invention.A kind of network connection speed predicting method of the present invention based on machine learning; its essence technology contents is to be defined in widely in claims; any technology entity or method that other people are finished; if it is identical with the definien of institute in claims; or the change of same equivalence, all will be regarded as being covered by within this scope of patent protection.

Claims (6)

1. network connection speed predicting method based on machine learning is characterized in that may further comprise the steps:
1) utilize custom browser, the website connection speed that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
2. a kind of network connection speed predicting method based on machine learning according to claim 1 is characterized in that the described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
3. a kind of network connection speed predicting method based on machine learning according to claim 1 is characterized in that the website connection speed that described utilization obtains, and uses the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
4. a kind of network connection speed predicting method according to claim 1 based on machine learning, it is characterized in that described predicated error according to neural network reduces situation, perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
5. a kind of network connection speed predicting method based on machine learning according to claim 1 is characterized in that the described estimated performance step of using decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
6. a kind of network connection speed predicting method based on machine learning according to claim 1 is characterized in that described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
CN2009101021269A 2009-08-13 2009-08-13 Network connection speed predicting method based on machine learning Expired - Fee Related CN101634995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101021269A CN101634995B (en) 2009-08-13 2009-08-13 Network connection speed predicting method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101021269A CN101634995B (en) 2009-08-13 2009-08-13 Network connection speed predicting method based on machine learning

Publications (2)

Publication Number Publication Date
CN101634995A true CN101634995A (en) 2010-01-27
CN101634995B CN101634995B (en) 2011-09-21

Family

ID=41594185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101021269A Expired - Fee Related CN101634995B (en) 2009-08-13 2009-08-13 Network connection speed predicting method based on machine learning

Country Status (1)

Country Link
CN (1) CN101634995B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116708A (en) * 2013-03-05 2013-05-22 星云融创(北京)信息技术有限公司 Device and method for evaluating website experience
CN107958695A (en) * 2017-11-17 2018-04-24 桂林电子科技大学 A kind of high-precision medicine quantitative approach based on machine learning
CN110362772A (en) * 2019-06-11 2019-10-22 北京邮电大学 Real-time webpage method for evaluating quality and system based on deep neural network
CN110445653A (en) * 2019-08-12 2019-11-12 灵长智能科技(杭州)有限公司 Network state prediction technique, device, equipment and medium
US10594568B2 (en) 2017-12-01 2020-03-17 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for measuring a data transmission speed, device and computer readable medium
CN113033783A (en) * 2021-04-07 2021-06-25 苏州瑞立思科技有限公司 Bandwidth adjusting method based on BP neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004029884A (en) * 2002-06-21 2004-01-29 Kengo Fujiwara Ranking analyzing device
CN1744529A (en) * 2004-08-31 2006-03-08 英业达股份有限公司 Automatic test method for multi web site performance
CN100338610C (en) * 2005-06-22 2007-09-19 浙江大学 Individual searching engine method based on linkage analysis
CN101018164A (en) * 2007-02-28 2007-08-15 西南科技大学 A TCP/IP network performance evaluation prediction method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116708A (en) * 2013-03-05 2013-05-22 星云融创(北京)信息技术有限公司 Device and method for evaluating website experience
CN107958695A (en) * 2017-11-17 2018-04-24 桂林电子科技大学 A kind of high-precision medicine quantitative approach based on machine learning
CN107958695B (en) * 2017-11-17 2021-12-14 桂林电子科技大学 High-precision medicine quantification method based on machine learning
US10594568B2 (en) 2017-12-01 2020-03-17 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for measuring a data transmission speed, device and computer readable medium
CN110362772A (en) * 2019-06-11 2019-10-22 北京邮电大学 Real-time webpage method for evaluating quality and system based on deep neural network
CN110445653A (en) * 2019-08-12 2019-11-12 灵长智能科技(杭州)有限公司 Network state prediction technique, device, equipment and medium
CN113033783A (en) * 2021-04-07 2021-06-25 苏州瑞立思科技有限公司 Bandwidth adjusting method based on BP neural network
CN113033783B (en) * 2021-04-07 2024-04-26 苏州瑞立思科技有限公司 Bandwidth adjustment method based on BP neural network

Also Published As

Publication number Publication date
CN101634995B (en) 2011-09-21

Similar Documents

Publication Publication Date Title
CN101615197B (en) Personalized network resource recommended method based on network connection speed
CN101634995B (en) Network connection speed predicting method based on machine learning
Pires et al. The nested assembly of individual‐resource networks
US8880449B2 (en) Methods and apparatus for computing graph similarity via signature similarity
CN102804735B (en) Browser pre-fetching method and system thereof
CN103399861B (en) A kind of network address in Web side navigation recommends methods, devices and systems
CN104298782B (en) Internet user actively accesses the analysis method of action trail
CN110175895A (en) A kind of item recommendation method and device
TW200925909A (en) Traffic predictor for network-accessible informaton modules
CN112487291B (en) Big data-based personalized news recommendation method and device
CN108537273A (en) A method of executing automatic machinery study for unbalanced sample
CN115221396A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN103914534B (en) Content of text sorting technique based on specialist system URL classification knowledge base
CN102117275B (en) Method and device for collecting webpage data of direction site based on internet
Xu et al. A novel model for user clicks identification based on hidden semi-Markov
Zou et al. Intentionality-related deep learning method in web prefetching
CN110555170B (en) System and method for optimizing user experience
CN109829504B (en) Prediction method and system for analyzing user forwarding behavior based on ICS-SVM
Shakiba et al. Spam query detection using stream clustering
Dennis et al. Data mining approach for user profile generation on advertisement serving
Huang et al. Location-aware query recommendation for search engines at scale
Lin et al. A novel website structure optimization model for more effective web navigation
CN114297462A (en) Intelligent website asynchronous sequence data acquisition method based on dynamic self-adaption
CN107679908A (en) Sales force's topic nonproductive poll method, electronic installation and storage medium
Feng et al. Markov tree prediction on web cache prefetching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110921

Termination date: 20130813