CN101634995B - Network connection speed predicting method based on machine learning - Google Patents
Network connection speed predicting method based on machine learning Download PDFInfo
- Publication number
- CN101634995B CN101634995B CN2009101021269A CN200910102126A CN101634995B CN 101634995 B CN101634995 B CN 101634995B CN 2009101021269 A CN2009101021269 A CN 2009101021269A CN 200910102126 A CN200910102126 A CN 200910102126A CN 101634995 B CN101634995 B CN 101634995B
- Authority
- CN
- China
- Prior art keywords
- user
- website
- neural network
- training
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000010801 machine learning Methods 0.000 title claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 91
- 238000003066 decision tree Methods 0.000 claims abstract description 39
- 238000012360 testing method Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims description 7
- 230000001537 neural effect Effects 0.000 claims description 6
- 230000006855 networking Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 5
- 238000002474 experimental method Methods 0.000 description 13
- 241001269238 Data Species 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a network connection speed predicting method based on machine learning. The method comprises the following steps: (1) recording the speed of connecting a user with a browsed website by a customized browser and taking the speed as a training set and a test set; (2) using neural network training and predicting the speed of connecting the user with all websites in the training set by the obtained website connecting speed; (3) executing the step (4) or dividing the training set into smaller training sets and executing the step (2) on all the training sets according to the condition of the predicted error reduction of all neural networks; (4) testing the predicting performance of the neural networks by a decision tree; and (5) predicting the speed of connecting the user with an unknown website by the decision tree and the neural networks. The invention predicts the speed of connecting the user with all the websites by an artificial intelligence technology and a machine learning method, improves the precision of network condition assessment, and fully uses the user bandwidth to provide better internet experience for the user.
Description
Technical field
The present invention relates to computer search and web technology field, relate in particular to a kind of network connection speed predicting method based on machine learning.
Background technology
In recent years, a series of research activities has appearred, studying personalized or user oriented search engine and algorithm, as be published in 2007 the 16 international web-seminar (WWW ' 07:Proceedings ofthe 16
ThInternational conference on World Wide Web) the one piece of article " the extensive evaluation and the analysis of personalized search strategy " on (" A large-scale evaluation and analysis of personalizedsearch strategies ").One piece of article in the 23 U.S. artificial intelligence association in 2008 meeting " based on the user oriented webpage sort algorithm of user concerned time " (" A user-oriented webpage rankingalgorithm based on user attention time ") lining, the author also proposes to set up the personalized solution of a user oriented web page search engine.The present invention is the network connection situation that is used for optimizing specially the personal user.In the present invention, we have studied user oriented optimum network to greatest extent and connect and select, and this is seldom related in the past research and invention work.
Because service quality is very crucial in the network insertion of web browser and many other types, any method that can improve service quality all has huge commercial value.Some solutions have the people to propose, and some has then dropped into commercial the use.In these solutions, the most successful large scale business software is to utilize simple idea, can open a plurality of linked network content supplier automatically and do parallel the download or visit.An example is a software (http://www.xunlei.com/) that is called a sudden peal of thunder, and this is one of most popular Chinese software.Yet, use this class method, web page contents supplier's website will be subjected to tremendous influence, because it is to visit webpage by auto-programming, rather than the final user, therefore, online advertisement will lose their value on these webpages.The enterprise that this problem has caused providing this type of service quality to improve serves and some the law court's cases between the web site contents supplier.In the present invention, we have proposed a kind ofly to come the predictive user network condition based on data mining method, can be used to set up the significant consideration of personalized web site commending system, thereby provide best service quality for the personal user.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of network connection speed predicting method based on machine learning is provided.
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Description of drawings
Fig. 1 is based on the embodiment flowage structure figure of the network connection speed predicting method of machine learning;
Fig. 2 is the process flow diagram that the present invention is applied to personalized network resource recommended system;
Fig. 3 is the virtual network architecture synoptic diagram that uses in the virtual network experiment;
Fig. 4 is the predicated error synoptic diagram under the neural network number situation in limiting the artificial neural network group; Horizontal ordinate is the number of neural network among the artificial neural network group, the predicated error when ordinate is predicted the training intensive data for using this neural network group and decision tree to be used to; Article three, curve has represented that respectively training set has 10000,50000, and the predicated error situation during 100000 data;
Fig. 5 is a schematic diagram data of testing acquisition in the virtual network of as shown in Figure 3 simulation internet situation; (a)-(f) distinguish the data in the corresponding tables 1 (a)-(f), and draw with the form of performance boost percentage; In each figure, horizontal ordinate is the number of neural network among the artificial neural network group, ordinate is used to predict the connection speed of unknown website for using this neural network group and decision tree, thereby after being used for network resource recommended system, the number percent that the data speed of download of acquisition promotes; In the experiment of (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms);
Fig. 6 is that the embodiment of the invention is used the experimental data figure under real China Internet; (a)-(c) shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, horizontal ordinate is #1~#20 numbering of 20 users of participation experiment, and ordinate is a user's download network resource data consumed time; When each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention and the experimental data of sudden peal of thunder software when the single site downloading mode under the similarity condition; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Embodiment
Network connection speed predicting method based on machine learning may further comprise the steps:
1) utilize custom browser, the connection speed of the website that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website.
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set.
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) user's tie-time that step (a)-(d) is obtained and user bandwidth historical data are as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after training.
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the use step (f) calculate the predicted value of website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training.
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
The present invention has effectively utilized artificial intelligence technology, the method of using multiple machine learning is come the connection speed between predictive user and each website, promoted the precision of assessment network condition, make the use of Internet resources can use user bandwidth to a greater extent, experience for the user provides better internet.
Embodiment
As shown in Figure 1, this method comprises training stage and forecast period two parts; Training stage comprises user's historical data 10, training set 20, and test set 30, artificial neural network 40, error judges 50, cuts apart training set 60, artificial neural network group 70, C4.5 decision tree 80; Forecast period comprises unknown website 90, C4.5 decision tree 80, artificial neural network group 70, connection speed predicted value 99.
User's historical data 10: the data of the user bandwidth of the user's tie-time during each access site during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Training set 20: select 10% at random as behind the test set from user's historical data 10, remaining 90% part is as training set;
Test set 30: from user's historical data, select 10% at random as test set;
Artificial neural network 40: in an embodiment, we have used one 4 layers artificial neural network, and wherein input layer is the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; Its output layer is 2 real numbers, tie-time estimated value and bandwidth estimation value between expression user and this website; Each neuron in other two-layer in this neural network is a sigmod function, and per two neurons between adjacent two layers all link to each other; Utilize back-propagation algorithm (back-propagation), constantly use user's historical data 10 that it is trained on the backstage.
Error judges 50: use the user's tie-time and the user bandwidth of artificial neural network 40 predictions each website in training set 20 after training, calculate the predicted value of this website and the error e between the actual value:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that common value is 200~1000 coefficient; If this step is not to be performed for the first time and the predicated error summation of this moment differs with last predicated error summation and is no more than 3%, then finish training to neural network, preserve at this moment all neural networks, obtain artificial neural network group 70.
Cut apart training set 60: the station data in the training set 20 is arranged from small to large by its predicated error in artificial neural network 40, and using the contiguous clustering algorithm of k that website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group; To each group in the m group, with it again as training set 20 and jump to artificial neural network 40 places and carry out.
Artificial neural network group 70: judge in 50 in error, if everyone artificial neural networks differs with the last predicated error summation of cutting apart before the training set 60 the predicated error summation of all training datas and is no more than 3%, everyone artificial neural networks of this moment is the artificial neural network group; Fig. 4 has shown the predicated error under the neural network number situation in limiting the artificial neural network group, this shows the necessity of setting up the neural network group and cutting apart training set.
C4.5 decision tree 80: all websites in the test set 30 are divided into 1000 groups by its network ip address, are numbered between 1~1000; If total n artificial neural network among the artificial neural network group 70, the neural network that each website is used to train in artificial neural network group 70 at last in the record training set 20 is numbered, and is numbered between 1~n; The value that is input as of this decision tree is 1~1000 a network ip address group #, and being output as value is the neural network numbering of 1~n; Utilize test set 30, use the C4.5 decision Tree algorithms to train this decision tree, preserve the decision tree after training;
Unknown website 90: the website of its connection speed of the unknown on the internet;
Connection speed predicted value 99: to unknown website 90, obtain its value and be the network ip address numbering between 1~1000, and use C4.5 decision tree 80 to obtain its corresponding neural network numbering, use its neural network corresponding in artificial neural network group 70 then, predict the user's tie-time and the user bandwidth of this website that obtains.
An important application of the present invention: the flowage structure of the network resource recommended system of propertyization as shown in Figure 2 one by one.You and preceding you two parts before this personalized recommendation system comprises, background end comprises custom browser 100, resource recommendation result 700; Preceding you comprises user's historical data 200, based on the network connection speed prediction 300 of machine learning, general search engine 400, basic search result 500, the merger of Search Results and adjustment 600.
Custom browser 100: by the form of plug-in unit, at existing Internet resources browser such as Firefox, but the module of the user bandwidth of the user's tie-time when embedding the each access site of recording user among the Internet Explorer during with the transmission data.
User's historical data 200: the data of the user bandwidth of the user's tie-time during each access site of obtaining by custom browser 100 during with the transmission data; Wherein the user of certain website is defined as each user the tie-time and sends request of access to this website and obtain time interval between this address response to the user, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user's tie-time be as the criterion; Speed of download when the user bandwidth of certain website is defined as each user from this website data download, if the user repeatedly visits this website, then with in the nearest week or the mean value of nearest 10 times user bandwidth be as the criterion.
Network connection speed prediction 300 based on machine learning: use a kind of network connection speed predicting method of the present invention, each website among the basic search result is predicted its user tie-time and user bandwidth based on machine learning.
General search engine 400 a: user interface is provided, calls the Internet resources search service; In the present embodiment, this interface is used and is realized with jsp; When the user submits a query requests to, call general network search engine (such as Google) and obtain Search Results.
Basic search result 500: after utilizing general search engine 400 to search for, preceding 100 results in its return results are resolved and obtained to its result of page searching, and, then the document is downloaded and deposited to this locality if this resource is a document.
The merger of Search Results and adjustment 600:1) merger: if the Internet resources that the user needs are text, then, use " text similarity estimation source code " (" the Code for estimating document similarity ") in the open source code package of Microsoft (Microsoft) to calculate the text similarity between them to per two in the Search Results; If its similarity is greater than 95%, then these two of marks are identical content; If the Internet resources of user's needs are other forms, then to per two in the Search Results, 10 offset location of picked at random, the data of comparison 1K byte length on each offset location in this data file of two; If this data file of two is identical in the data of all 10 positions, then they is labeled as and has identical content; The website that then all is had identical content all is integrated in the middle of the search result items that this content the most preceding occurs, is combined into a search result items; 2) adjust: if the network resource data size that the user needs is less than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate the user resequence from small to large the tie-time; If the network resource data size that the user needs is greater than 100K, to each has comprised the search result items of two or more websites in the Search Results, with wherein website according to user's tie-time and bandwidth prediction 60 estimate user bandwidth resequence from big to small;
Resource recommendation result 700: obtain user oriented personalized resource recommendation result after the merger of process Search Results and the process of adjustment 600; This recommendation results has fully taken into account personal network's situation of user, makes the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The experimental result of table 1~2 demonstrates the superiority of this method clearly;
Table 1 is data of testing acquisition in the virtual network of a simulation internet situation; The network structure of this virtual network as shown in Figure 3; Each website is under the multitiered network structure that is formed by some gateway tissues; Total about 30000 computing machines in this virtual network, be distributed in three different Internet service providers (ISP) under; Gateways at different levels if more near the network root then its time delay more little and bandwidth near netting twine is big more, if more away from the network root then its time delay big more and bandwidth near netting twine is more little; Be about 1/100 of inner each gateway delay time of same ISP the time delay of the main line gateway between the different I SP, bandwidth is about 50 times; In our experiment, we have set up 500 different resource data, and each duplicates 2000 parts, are randomly dispersed in the computing machine in the virtual network; User's resource query request each time supposes that search engine can return wherein 90% website, and random alignment; The probability that the i item in the site list is returned in our suppose user clicks search is
In table 1, listed and used before and after the network resource recommended system that has the method for the invention embodiment, the user obtains the spended time altogether of its resource requirement; Each row represents that respectively user's resource requirement is of a size of the 10K byte in each experiment, obtains the required time of this resource when 1M byte and 100M byte; Each row is not when the network resource recommended system that has the embodiment of the invention is used in expression respectively, use has the network resource recommended system of the embodiment of the invention and limits and has 1 among the artificial neural network group, and 5,10,50, the experimental data during 100 neural networks; Each experimental data is the same experiment flow average data after 100 times repeatedly, and experimental data unit is millisecond (ms); In the experiment of table 1 (a)-(c), the user is set and is in than on the computing machine away from the network root, and promptly analog dialup network user situation also has 10000 in advance, and 50000, the experimental data during 100000 user's historical records; In the experiment of table 1 (d)-(f), the user is set and is in than on the computing machine near the network root, and promptly simulate the broadband network user situation and have 10000 in advance, 50000, the experimental data during 100000 user's historical records; All data of table 1 (a)-(f) all are presented among each figure of Fig. 5 (a)-(f) with the graph mode correspondence with the data mode of performance boost number percent.
Table 1
(a) analog dialup network, 10000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 285.3ms | 23472ms | 2290176ms |
?1 | 233.7ms | 17205ms | 1662668ms |
?5 | 117.8ms | 10445ms | 846298ms |
?10 | 97.0ms | 8098ms | 785530ms |
?50 | 87.1ms | 6957ms | 695418ms |
?100 | 88.3ms | 6329ms | 718152ms |
(b) analog dialup network, 50000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 296.6ms | 23631ms | 2394101ms |
?1 | 238.8ms | 18149ms | 1941615ms |
?5 | 118.6ms | 8791ms | 945567ms |
?10 | 77.1ms | 6940ms | 825965ms |
?50 | 76.3ms | 5334ms | 631479ms |
?100 | 73.3ms | 5404ms | 598086ms |
(c) analog dialup network, 100000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 269.8ms | 22255ms | 2250904ms |
?1 | 195.6ms | 17381ms | 1609396ms |
?5 | 80.1ms | 7077ms | 841838ms |
?10 | 54.8ms | 4874ms | 567228ms |
?50 | 41.5ms | 4015ms | 457065ms |
?100 | 41.5ms | 3387ms | 338572ms |
(d) simulation broadband network, 10000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 135.8ms | 3808ms | 486680ms |
?1 | 133.1ms | 4116ms | 476460ms |
?5 | 123.1ms | 3610ms | 435579ms |
?10 | 124.5ms | 3397ms | 346516ms |
50 | 102.5ms | 3046ms | 305148ms |
100 | 98.9ms | 2871ms | 279206ms |
(e) simulation broadband network, 50000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 140.2ms | 4597ms | 369387ms |
?1 | 142.4ms | 4951ms | 361999ms |
?5 | 125.9ms | 4275ms | 340575ms |
?10 | 108.7ms | 3480ms | 271869ms |
?50 | 106.1ms | 3273ms | 253768ms |
?100 | 103.5ms | 3167ms | 249336ms |
(f) simulation broadband network, 100000 user's historical records
The neural network number | The 10K byte | The 1M byte | The 100M byte |
Not 0 (not using the present invention) | 175.8ms | 8012ms | 494828ms |
?1 | 180.9ms | 7403ms | 510168ms |
?5 | 152.2ms | 6794ms | 422088ms |
?10 | 114.8ms | 3332ms | 220693ms |
?50 | 101.8ms | 2660ms | 196942ms |
?100 | 97.2ms | 2732ms | 191004ms |
Table 2 is experimental datas that the present invention compares with sudden peal of thunder software under the virtual network situation; Table 2 (a)-(c) has shown that respectively qualification user historical data is 10000,50000, and the experimental data in the time of 100000; In each table, each row respectively in the each experiment of expression user's resource requirement data size size be the 10K byte, the 1M byte obtains the required time of this resource during the 100M byte; The expression when not using embodiment of the invention system and use experimental data after the network resource recommended system that has the embodiment of the invention respectively of each row; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (simulating its single site downloading mode) is also listed in table as a comparison; Each experimental data is the similar resource average data after 100 times repeatedly; Experimental data unit is millisecond (ms).
Table 2
(a) 10000 user's historical datas
The 10K byte | The 1M byte | The 100M byte | |
Do not use the present invention | 285.3ms | 23472ms | 2290176ms |
Use the present invention | 88.3ms | 6329ms | 718152ms |
Use a sudden peal of thunder | 254.4ms | 19603ms | 1951160ms |
(b) 50000 user's historical datas
The 10K byte | The 1M byte | The 100M byte | |
Do not use the present invention | 296.6ms | 23631ms | 2394101ms |
Use the present invention | 73.3ms | 5404ms | 590086ms |
Use a sudden peal of thunder | 253.6ms | 18858ms | 1855428ms |
(c) 100000 user's historical datas
The 10K byte | The 1M byte | The 100M byte | |
Do not use the present invention | 269.8ms | 22255ms | 2250904ms |
Use the present invention | 41.5ms | 3387ms | 338572ms |
Use a sudden peal of thunder | 204.8ms | 15710ms | 1482256ms |
Fig. 6 is that the embodiment of the invention is used the experimental data under real China Internet; In the experiment of Fig. 6, ((user #11~#20) has used the network resource recommended system that has the embodiment of the invention for user #1~#10) and 10 broadband network users from different regions for 10 Dial-up Network users from different regions; After using fortnight, to the exemplary resource on 3 kinds of internets: text, PDF document and online game installation file conduct interviews; Fig. 6 (a)-(c) has shown the user's download text respectively, the experimental data of PDF document and online game installation file required time; In each figure, when each column figure represents not use the network resource recommended system that has the embodiment of the invention respectively and use experimental data after the network resource recommended system that has the embodiment of the invention; In order better to show the special efficacy of this method, the experimental data of sudden peal of thunder software under the similarity condition (being defined as the single site downloading mode) is also listed as a comparison with the column figure; Each user's experimental data is all listed in the drawings; Each experimental data is the similar resource average data after 100 times repeatedly; The average data size of above-mentioned three class resources is respectively the 10.6K byte, 3.49M byte, 784M byte; Experimental data unit is second (sec.).
Above-mentioned experiment shows, the present invention has effectively utilized user's web-based history Visitor Logs, the method of using artificial intelligence has been predicted the connection speed between user and each website, personal network's situation of user has been combined in the access to netwoks process, make the use of Internet resources can use user bandwidth to a greater extent, can experience for the user provides better internet.
The above only is the preferred embodiment of a kind of network connection speed predicting method based on machine learning of the present invention, is not in order to limit the scope of essence technology contents of the present invention.A kind of network connection speed predicting method of the present invention based on machine learning; its essence technology contents is to be defined in widely in claims; any technology entity or method that other people are finished; if it is identical with the definien of institute in claims; or the change of same equivalence, all will be regarded as being covered by within this scope of patent protection.
Claims (1)
1. network connection speed predicting method based on machine learning is characterized in that may further comprise the steps:
1) utilize custom browser, the website connection speed that recording user was browsed is as training set and test set;
2) utilize the website connection speed that obtains, use the connection speed of each website in neural metwork training and predictive user and the training set;
3) predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2);
4) use decision tree to test the estimated performance of neural network;
5) use decision tree and neural network, the connection speed of predictive user and any unknown website;
The described custom browser of utilizing, the connection speed of the website that recording user was browsed, as training set and test set step:
(a) to each website of user capture, write down each user and send request of access to the time interval that the user obtains to respond to the website, be designated as user's tie-time of website;
(b) to each website of user capture, the speed of download when writing down each user from the website data download is designated as the user bandwidth of website;
(c) if user's access websites repeatedly, then with in the nearest week or the mean value of nearest 10 times user's tie-time as user's tie-time of website, with in the nearest week or the mean value of nearest 10 times user bandwidth as the user bandwidth of website;
(d) select 10% at random as test set from user's historical data, all the other are 90% as training set;
The website connection speed that described utilization obtains, use the connection speed step of each website in neural metwork training and predictive user and the training set:
(e) set up artificial neural network, it is input as the characteristic of a website: comprise that one is expressed as the network ip address of 32 integers and the hourage that the integer of 1 value between 0~23 is used to represent the current time; It is output as 2 real numbers, represents the tie-time estimated value and the bandwidth estimation value of user and website respectively;
(f) use back-propagation algorithm, user's tie-time that obtains with step (a)-(d) and user bandwidth historical data be as training set, and the neural network of using back-propagation algorithm training step (e) to be set up is preserved the neural network after the training;
Described predicated error according to neural network reduces situation, and perhaps execution in step 4), perhaps training set is divided into littler training set and each training set is returned execution in step 2) step:
(g) use the user's tie-time and the user bandwidth of neural network prediction each website in training set after the training in the step (f), calculate the predicated error e of website:
e=t+Kb*b
Wherein t is the predicated error of user's tie-time, and unit is a millisecond; B is the predicated error of user bandwidth, and unit is a kilobits per second; Kb is that value is 200~1000 coefficient;
(h) if step (g) is not to be performed for the first time, and predicated error e summation and last predicated error summation differ and are no more than 3%, then redirect execution in step (k);
(i) website data in the training set is arranged from small to large by its predicated error e in step (g), and used the contiguous clustering algorithm of k that the website is divided into the m group, m is a value between 1~5 and make the integer of consensus forecast error difference maximum between each group;
(j) to each the networking station in the step (i), with it as training set redirect execution in step (e);
The estimated performance step of described use decision tree test neural network:
(k) website in the test set that step (d) is obtained is divided into 1000 groups by its network ip address, is numbered between 1~1000; If finally used n neural network in step (e)-(j) process, record wherein each website its neural network that is used at last train after step (e)-(j) finishes is numbered, and is numbered between 1~n;
(l) set up a decision tree, it is input as value is 1~1000 network ip address group #, and being output as value is the neural network numbering of 1~n;
(m) the test set data of utilizing step (d) to be obtained use the C4.5 decision Tree algorithms to come the decision tree of being set up in the training step (l), preserve the decision tree after training;
Described use decision tree and neural network, the connection speed step of predictive user and any unknown website:
(n), be network ip address numbering between 1~1000 according to its value of the described acquisition of step (k), and the decision tree of using step (m) to obtain obtain its corresponding neural network numbering to the website of any one unknown its connection speed;
(o) use its corresponding neural network, predict the user's tie-time and the user bandwidth of this website.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101021269A CN101634995B (en) | 2009-08-13 | 2009-08-13 | Network connection speed predicting method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101021269A CN101634995B (en) | 2009-08-13 | 2009-08-13 | Network connection speed predicting method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101634995A CN101634995A (en) | 2010-01-27 |
CN101634995B true CN101634995B (en) | 2011-09-21 |
Family
ID=41594185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009101021269A Expired - Fee Related CN101634995B (en) | 2009-08-13 | 2009-08-13 | Network connection speed predicting method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101634995B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116708A (en) * | 2013-03-05 | 2013-05-22 | 星云融创(北京)信息技术有限公司 | Device and method for evaluating website experience |
CN107958695B (en) * | 2017-11-17 | 2021-12-14 | 桂林电子科技大学 | High-precision medicine quantification method based on machine learning |
CN108011780B (en) | 2017-12-01 | 2019-01-22 | 北京百度网讯科技有限公司 | A kind of message transmission rate measurement method, device, equipment and computer-readable medium |
CN110362772B (en) * | 2019-06-11 | 2022-04-01 | 北京邮电大学 | Real-time webpage quality evaluation method and system based on deep neural network |
CN110445653B (en) * | 2019-08-12 | 2022-03-29 | 灵长智能科技(杭州)有限公司 | Network state prediction method, device, equipment and medium |
CN113033783B (en) * | 2021-04-07 | 2024-04-26 | 苏州瑞立思科技有限公司 | Bandwidth adjustment method based on BP neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004029884A (en) * | 2002-06-21 | 2004-01-29 | Kengo Fujiwara | Ranking analyzing device |
CN1710560A (en) * | 2005-06-22 | 2005-12-21 | 浙江大学 | Individual searching engine method based on linkage analysis |
CN1744529A (en) * | 2004-08-31 | 2006-03-08 | 英业达股份有限公司 | Automatic test method for multi web site performance |
CN101018164A (en) * | 2007-02-28 | 2007-08-15 | 西南科技大学 | A TCP/IP network performance evaluation prediction method |
-
2009
- 2009-08-13 CN CN2009101021269A patent/CN101634995B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004029884A (en) * | 2002-06-21 | 2004-01-29 | Kengo Fujiwara | Ranking analyzing device |
CN1744529A (en) * | 2004-08-31 | 2006-03-08 | 英业达股份有限公司 | Automatic test method for multi web site performance |
CN1710560A (en) * | 2005-06-22 | 2005-12-21 | 浙江大学 | Individual searching engine method based on linkage analysis |
CN101018164A (en) * | 2007-02-28 | 2007-08-15 | 西南科技大学 | A TCP/IP network performance evaluation prediction method |
Also Published As
Publication number | Publication date |
---|---|
CN101634995A (en) | 2010-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101615197B (en) | Personalized network resource recommended method based on network connection speed | |
CN101634995B (en) | Network connection speed predicting method based on machine learning | |
US8880449B2 (en) | Methods and apparatus for computing graph similarity via signature similarity | |
CN102804735B (en) | Browser pre-fetching method and system thereof | |
CN102231165B (en) | Method for searching and sequencing personalized web pages based on user retention time analysis | |
CN103399861B (en) | A kind of network address in Web side navigation recommends methods, devices and systems | |
CN102646129A (en) | Topic-relative distributed web crawler system | |
US20130041898A1 (en) | Image processing system, image processing method, program, and non-transitory information storage medium | |
CN104298782B (en) | Internet user actively accesses the analysis method of action trail | |
TW200925909A (en) | Traffic predictor for network-accessible informaton modules | |
CN101789887A (en) | Method and device for classifying network users and system for monitoring network services | |
CN104268142B (en) | Based on the Meta Search Engine result ordering method for being rejected by strategy | |
CN108537273A (en) | A method of executing automatic machinery study for unbalanced sample | |
Gupta et al. | A review on search engine optimization: Basics | |
CN115221396A (en) | Information recommendation method and device based on artificial intelligence and electronic equipment | |
CN102117275B (en) | Method and device for collecting webpage data of direction site based on internet | |
Zou et al. | Intentionality-related deep learning method in web prefetching | |
CN110555170B (en) | System and method for optimizing user experience | |
Shakiba et al. | Spam query detection using stream clustering | |
Dennis et al. | Data mining approach for user profile generation on advertisement serving | |
Huang et al. | Location-aware query recommendation for search engines at scale | |
Lin et al. | A novel website structure optimization model for more effective web navigation | |
CN114297462A (en) | Intelligent website asynchronous sequence data acquisition method based on dynamic self-adaption | |
Feng et al. | Markov tree prediction on web cache prefetching | |
Lei | Modeling and intelligent analysis of web user behavior of web user behavior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110921 Termination date: 20130813 |