CN107292519B - Browsing service perception index prediction method based on multi-label learning - Google Patents

Browsing service perception index prediction method based on multi-label learning Download PDF

Info

Publication number
CN107292519B
CN107292519B CN201710493097.8A CN201710493097A CN107292519B CN 107292519 B CN107292519 B CN 107292519B CN 201710493097 A CN201710493097 A CN 201710493097A CN 107292519 B CN107292519 B CN 107292519B
Authority
CN
China
Prior art keywords
sample
samples
sample set
training
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710493097.8A
Other languages
Chinese (zh)
Other versions
CN107292519A (en
Inventor
李克
徐小龙
王海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Union University
Original Assignee
Beijing Union University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Union University filed Critical Beijing Union University
Priority to CN201710493097.8A priority Critical patent/CN107292519B/en
Publication of CN107292519A publication Critical patent/CN107292519A/en
Application granted granted Critical
Publication of CN107292519B publication Critical patent/CN107292519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a browsing-type service perception index prediction method based on multi-label learning, which aims to solve the problem of how to timely and accurately predict a KQI index of a webpage browsing-type service of a user according to the scene of the user; according to the massive user service perception historical data, namely the quality of service perception indexes under different scenes, the service experience quality of a user under a specific scene is predicted and early warned, the service experience problem can be found as soon as possible, relevant measures can be taken in time to improve the service experience problem, and the complaint rate and the off-network rate are effectively reduced.

Description

Browsing service perception index prediction method based on multi-label learning
Technical Field
The invention belongs to the technical field of network services, and particularly relates to a browsing-type service perception index prediction method based on multi-label learning.
Background
When a mobile network user uses an OTT service (e.g., web browsing, video playing, etc.), the quality of the service experience of the mobile network user can be generally evaluated by using a set of KQI (key quality indicator) indexes, such as web page opening delay, download rate, etc. The quality of the experience is affected by various factors, including the quality of the terminal, the quality of the mobile network where the service is used, the quality of the APP, the bandwidth and load of the SP website server cluster, and the like.
As a transmission channel provider for various services and a key link for service experience guarantee, a telecom operator needs to guarantee service experience of a user as much as possible, otherwise, the telecom operator may cause complaints of the user or even off-network.
At present, network operation and optimization departments, which are generally telecommunication operators, guarantee network quality through daily network optimization operations, but a great difference still exists between the network quality and service experience of users, and good network quality cannot necessarily guarantee good service experience (since the service experience is a comprehensive effect of the above factors). The customer service department finds the problem of service experience only when receiving the complaint of the user, and then coordinates the network operation and optimization department to troubleshoot and solve the problem, which is often passive.
If the service experience of the user can be continuously monitored in daily network operation, and the service experience of the user in a specific scene can be predicted and early warned according to massive user service perception historical data (the quality of service perception indexes in different scenes), the service experience problem can be found as soon as possible, relevant measures can be taken in time to improve the service experience problem, and the complaint rate and the off-network rate can be effectively reduced.
Disclosure of Invention
The invention provides a browsing service perception index prediction method based on multi-label learning, and aims to solve the problem of timely and accurately predicting a KQI index of a webpage browsing service of a user according to a scene where the user is located.
In order to achieve the purpose, the invention adopts the following technical scheme:
a browsing class service perception index prediction method based on multi-label learning comprises the following steps:
s1, constructing a training sample set for the browsing service perception sample data set;
step S2, constructing a k nearest neighbor sample set of the training samples;
step S3: calculating prior probability and normalization frequency matrix
For each mark item yj,j1-q, the prior probability is calculated according to the following formula (1)
Figure BDA0001331957350000011
And
Figure BDA0001331957350000012
Figure BDA0001331957350000013
wherein HjAnd
Figure BDA0001331957350000014
respectively representing the newly acquired unlabeled sample x with and without the label item yjTo do so
Figure BDA0001331957350000015
And
Figure BDA0001331957350000016
then respectively represent HjAnd
Figure BDA0001331957350000021
the prior probability of being established, s being a control parameter,
calculating a normalized frequency matrix [ f ] according to the following formulas (2) and (3)j[r]k×qAnd
Figure BDA0001331957350000022
Figure BDA0001331957350000023
Figure BDA0001331957350000024
wherein the content of the first and second substances,j(xi) Represented training sample xiHas a label y in a close-neighbor sample ofjNumber of samples [. C]Denotes rounding, fj[r]Indicating the presence of a marker y in the training sample setjAnd at the same time has a duty ratio of
Figure BDA0001331957350000025
Also has a label yjThe number of training samples of (a) is,
Figure BDA0001331957350000026
it means that there is no label y in the training sample setjAnd at the same time has a duty ratio of
Figure BDA0001331957350000027
Has a label yjThe number of training samples;
step S4: constructing k neighbor sample set of unknown sample x
For unknown sample x, a k-neighbor sample set of the sample is constructed in the training sample set according to the method of step S2
Figure BDA0001331957350000028
The actual number of nearest neighbor samples is kxWherein k isx≤k;
Step S5: computing homolabeled statistics for unknown samples x
For each tag item yjJ is 1 to q, and is counted according to the formula (4)
Figure BDA0001331957350000029
Number of samples with the marker entry { C }jIs called the unknown sample x at its kxHomologous statistics in the nearest neighbor sample set:
Figure BDA00013319573500000210
step S6: calculating likelihood probability of unknown sample x
Calculating likelihood probabilities according to equations (5) and (6)
Figure BDA00013319573500000211
And
Figure BDA00013319573500000212
Figure BDA00013319573500000213
Figure BDA0001331957350000031
wherein the content of the first and second substances,
Figure BDA0001331957350000032
indicating when the unknown sample x has a label yjWhen there is a ratio in its nearest neighbor sample
Figure BDA0001331957350000033
Also having a label yjThe likelihood of (d);
step S7: estimating the label value of an unknown sample x
The estimated value { Y) of the marker set Y of the unknown sample x is calculated from the following equations (7) and (8)1,y2I.e. that
Figure BDA0001331957350000034
Considering the strong correlation of two indexes of the first packet delay and the page opening delay, especially the influence of the first packet delay on the page opening delay, y is estimated2That is, if the flag item of the page opening delay is established (that is, the flag value is 1), the following method is adopted to calculate:
Figure BDA0001331957350000035
preferably, step S1 includes the steps of:
step S1a, selecting attribute items of training sample set
Selecting a subset of the samples from all fields of the samples, namely { date, time, longitude, latitude, large area number, cell number, field strength, signal quality, website name, website IP, DNS IP, user identification and terminal model }, and using the selected subset as an attribute set x ═ x { x ═ of the training samples }1,x2,...,xdD is the dimension of the attribute set; the system comprises attribute fields, a server and a server, wherein the attribute fields { date, time, longitude, latitude, field intensity and signal quality } are numerical data, and the attribute fields { major district number, cell number, website name, website IP, DNS IP, user identification and terminal model } are name data;
step S1b, selecting label items of training sample set
Selecting a subset of all fields of the sample, namely { initial packet delay, page open delay }, as a mark set Y of the training sample { Y ═ Y }1,y2,...,yqQ is the dimension of a mark set, wherein a mark field { first packet time delay, page opening time delay } is Boolean data;
step S1c, selection of training sample
According to the attribute set and the tag set selected in step S1a and step S1bRandomly selecting m samples from the browsing traffic aware sample set as a training sample set D, i.e. D { (x)i,Yi)|1≤i≤m};
Step S1d, conversion of training sample attribute values and label values
If the original values of date and time in the training sample are not numeric data, a conversion is made: the date value is defined as 0 by taking a certain date as a reference, the number of days from the reference date is taken as the representation of the date value in the training sample, the time is taken as a reference point when zero is taken, the minute is taken as granularity,
all numerical data in the training samples were normalized using the following formula:
Figure BDA0001331957350000041
wherein
Figure BDA0001331957350000042
The true value of the attribute i is represented,
Figure BDA0001331957350000043
and
Figure BDA0001331957350000044
the minimum and maximum values of the attribute in the training sample set are identified.
For each mark field in training sample { first packet delay y1Page open delay y2All numerical data (respectively recorded as numerical data) in the original' browsing service perception sample set
Figure BDA0001331957350000045
) According to the preset perception quality difference judgment threshold { T }1,T2The data are converted into Boolean type data according to the formula (9), namely:
Figure BDA0001331957350000046
wherein the function [ c ] indicates that 1 is returned when the condition c is satisfied, and 0 is returned otherwise.
Preferably, in step S2, each sample vector x in the training sample set isiI is 1 to m, at most k nearest neighbor samples of the sample vector are searched in the training sample set, and k nearest neighbor sample set of the sample vector is formed
Figure BDA0001331957350000047
The number of the actual nearest neighbor samples of the sample set is kiAnd k isiK is not more than k; the specific method comprises the following steps:
step S2a, sample vector xi={xilAnd l is 1-d, all the training sample sets are searched for x according to the date attributei1Samples whose distance is less than a set threshold Td (default value is 10) constitute an initial nearest neighbor sample set
Figure BDA0001331957350000048
Step S2b, in the initial nearest neighbor sample set
Figure BDA0001331957350000049
In (3), a sample is sought that satisfies at least one of the following conditions, namely, xi5Same as, or calculated from longitude and latitude and xiThe longitude and latitude Euclidean distance of the intermediate nearest neighbor sample set is smaller than a set threshold Tdis
Figure BDA00013319573500000410
Step S2c, calculating a middle nearest neighbor sample set
Figure BDA00013319573500000411
Medium sample vector and sample vector xiThe weighted Euclidean distances are arranged according to ascending order of distance values, and the maximum first k samples are taken as a k nearest neighbor sample set
Figure BDA00013319573500000412
The invention has the following beneficial effects:
according to massive user service perception historical data (quality of service perception indexes under different scenes), service experience of a user under a specific scene is predicted and early warned, service experience problems can be found early, relevant measures can be taken timely to improve the service experience problems, and complaint rate and off-network rate are effectively reduced.
Drawings
FIG. 1 is a flow chart of a prediction method of the present invention;
FIG. 2 is a flow chart for constructing a training sample set.
Detailed Description
As shown in fig. 1 and 2, the invention provides a browsing-class service perception index prediction method based on multi-label learning, which comprises the following steps:
step S1: constructing a training sample set
Under a known local mobile network (such as an LTE network in beijing), when a user browses a certain webpage in a predefined target webpage set (such as a surf homepage, a search homepage and the like) on an intelligent terminal by using a webpage browsing service App (such as UCweb, QQ browser and the like), a 'webpage browsing service perception sample' at the moment is obtained in a manner of data acquisition App deployed on the user terminal and the like; all samples collected from a large number of user terminals in a certain time range form a browsing service perception sample set.
The information (i.e. sample field) contained in the web browsing service perception sample at least comprises: date, time, network type, cell identification, current longitude and latitude of the terminal, field intensity (different names in different types: Rxlevel of GSM network, RSRP of LTE network, etc.), signal quality (different names in different types: C/I, SINR, RSRQ, etc.), user identification (IMSI), terminal identification (IMEI or MEID), terminal model, browser App name, browsing website URL, browsing website IP, DNS IP, first packet delay, page opening delay, DNS analysis delay, TCP connection delay, GET request delay, and receiving response delay.
Wherein: the cell id is a combination of id parameters that uniquely identifies a cell, and generally consists of a macro cell number + a cell number. The names of the parameters used by different networks are different, for example, GSM, WCDMA and TD-SCDMA networks are LAC + CI, and LTE is TAC + ECI.
Wherein: the "top packet latency" is defined as the time elapsed from the initiation of the web browsing request by the user to the receipt of the first HTTP 200OK packet to the target server response. The first packet delay is DNS analysis delay + TCP connection response delay + GET request response delay.
Wherein: the "page opening delay" is defined as the time from the user initiating the browsing request to the completion of the entire HTTP page download (only the page text content, not including the secondary loading of resources). The page opening delay is the first packet delay plus the response receiving delay.
Wherein: "DNS resolution delay" refers to a delay from when a terminal initiates a DNS resolution request to when DNS resolution is completed; "TCP connection latency" refers to the latency from the end of DNS resolution to the completion of TCP connection (three-way handshake) establishment; "GET request latency" refers to the latency from the time a GET request is issued to the time a first TCP packet (containing an HTTP 200OK) is received; the "reception response delay" refers to a delay from the reception of the first response packet to the transmission of the FIN, ACK by the terminal (i.e., reception is completed).
Step S1 a: attribute item selection for training sample sets
Selecting a subset of the above samples, namely { date, time, longitude, latitude, large area number, cell number, field strength, signal quality, website name, website IP, DNS IP, user identification and terminal model } from all fields of the samples, and using the subset as an attribute set x ═ x { x, longitude, latitude, large area number, cell number, field strength, signal quality, website name, website IP, DNS IP, user identification and terminal model } of the training samples1,x2,...,xdD is the dimension of the attribute set, where d is 13; the system comprises attribute fields, a server and a server, wherein the attribute fields { date, time, longitude, latitude, field intensity and signal quality } are numerical data, and the attribute fields { major district number, cell number, website name, website IP, DNS IP, user identification and terminal model } are name data;
step S1 b: labeled item selection for training sample set
Selecting a subset of the samples from all the fields of the samples, namely { initial packet delay, page open delay }, as trainingSample's marker set Y ═ Y1,y2,...,yqQ is the dimension of the label set, where q is 2; wherein, the mark field { initial packet delay, page opening delay } is Boolean data;
step S1 c: selection of training samples
Randomly selecting m samples from the browsing traffic perception sample set as a training sample set D according to the attribute set and the tag set selected in steps S1a and S1b, namely D { (x)i,Yi)|1≤i≤m};
Step S1 d: conversion of training sample attribute values and label values
If the original values of date and time in the training sample are not numeric data, a conversion is made: taking a certain date as a reference (such as 1 month and 1 day 2015), defining the date value as 0, and taking the number of days from the reference date as the representation of the date value in the training sample. Time is expressed in terms of zero as the reference point and minutes as the granularity.
All numerical data in the training samples are normalized by equation (1), i.e.:
Figure BDA0001331957350000061
wherein
Figure BDA0001331957350000062
The true value of the attribute i is represented,
Figure BDA0001331957350000063
and
Figure BDA0001331957350000064
the minimum and maximum values of the attribute in the training sample set are identified.
For each mark field in training sample { first packet delay y1Page open delay y2All numerical data (respectively recorded as numerical data) in the original' browsing service perception sample set
Figure BDA0001331957350000065
) According to the preset perception quality difference judgment threshold { T }1,T2Converting the data into Boolean type data according to the formula (1), namely:
Figure BDA0001331957350000066
where the function [ c ] indicates that 1 is returned when the condition c holds, and 0 is returned otherwise.
Step S2: constructing k nearest neighbor sample set of training samples
For each sample vector x in the training sample setiI is 1 to m, at most k nearest neighbor samples of the sample vector are searched in the training sample set, and k nearest neighbor sample set of the sample vector is formed
Figure BDA0001331957350000067
The number of the actual nearest neighbor samples of the sample set is ki(ki. ltoreq.k); the specific method comprises the following steps:
step 2 a: for sample vector xi={xilAnd l is 1-d, all and x (except the sample) are searched in the training sample set according to the date attributei1Samples whose distance is less than a set threshold Td (default value is 10) constitute an initial nearest neighbor sample set
Figure BDA0001331957350000068
And step 2 b: in the initial nearest neighbor sample set
Figure BDA0001331957350000071
In (3), a sample is sought that satisfies at least one of the following conditions, namely, xi5(i.e. the major area number) is the same as or calculated from longitude and latitude and xiThe longitude and latitude Euclidean distance of the intermediate nearest neighbor sample set is smaller than a set threshold Tdis (the default value is 2000 m), and the intermediate nearest neighbor sample set is formed
Figure BDA0001331957350000072
And step 2 c: computing a set of intermediate nearest neighbor samples
Figure BDA0001331957350000073
Medium sample vector and sample vector xiThe weighted Euclidean distances are arranged according to ascending order of distance values, and the maximum first k samples are taken as a k nearest neighbor sample set
Figure BDA0001331957350000074
Step S3: calculating prior probability and normalization frequency matrix
For each mark item yj,jThe prior probability is calculated by the following equation (2) when the values are 1 to q
Figure BDA0001331957350000075
And
Figure BDA0001331957350000076
Figure BDA0001331957350000077
wherein HjAnd
Figure BDA0001331957350000078
respectively representing newly acquired unlabeled samples (called "unknown samples", i.e. only attribute information, no label information) x with and without label items yj(i.e., the tag item y)j1 and 0), and
Figure BDA0001331957350000079
and
Figure BDA00013319573500000710
then respectively represent HjAnd
Figure BDA00013319573500000711
the prior probability of being true, s is the control parameter (typically taken to be 1).
Then, the normalization frequency matrix [ f ] is calculated according to the following equations (3) and (4)j[r]]k×qAnd
Figure BDA00013319573500000712
Figure BDA00013319573500000713
Figure BDA00013319573500000714
wherein the content of the first and second substances,j(xi) Represented training sample xiHas a label y in a close-neighbor sample ofjNumber of samples [. C]Indicating rounding. F is thenj[r]Indicating the presence of a marker y in the training sample setjAnd at the same time has a duty ratio of
Figure BDA00013319573500000715
Also has a label yj(i.e., the label value is 1). While
Figure BDA00013319573500000716
It means that there is no label y in the training sample setj(i.e., the flag is 0) and at the same time the duty ratio is
Figure BDA0001331957350000081
Has a label yjThe number of training samples.
Step S4: constructing k neighbor sample set of unknown sample x
For unknown sample x, a k-neighbor sample set of the sample is constructed in the training sample set according to the method of step S2
Figure BDA0001331957350000082
The actual number of nearest neighbor samples is kx,(kx≤k);
Step S5: computing homolabeled statistics for unknown samples x
For each tag item yjJ is 1 to q, and is counted according to the formula (5)
Figure BDA0001331957350000083
Number of samples having the flag term (i.e., value 1) in { C }jIs called the unknown sample x at its kxHomologous statistics in the nearest neighbor sample set:
Figure BDA0001331957350000084
step S6: calculating likelihood probability of unknown sample x
Calculating likelihood probabilities according to equations (6) and (7)
Figure BDA0001331957350000085
And
Figure BDA0001331957350000086
Figure BDA0001331957350000087
Figure BDA0001331957350000088
Figure BDA0001331957350000089
indicating when the unknown sample x has a label yjWhen there is a ratio in its nearest neighbor sample
Figure BDA00013319573500000810
Also having a label yjLikelihood (likelihood).
Step S7: estimating the label value of an unknown sample x
Based on the calculation results of the previous steps, the estimated value { Y ] of the marker set Y of the unknown sample x can be calculated by the following formulas (8) and (9)1,y2}. Wherein:
Figure BDA00013319573500000811
considering the strong correlation of two indexes of the first packet delay and the page opening delay, especially the influence of the first packet delay on the page opening delay, y is estimated2That is, if the flag item of the page opening delay is established (that is, the flag value is 1), the following method is adopted to calculate:
Figure BDA0001331957350000091

Claims (3)

1. a browsing-class service perception index prediction method based on multi-label learning is characterized by comprising the following steps:
step s1, constructing a training sample set for the browsing service perception sample data set;
step s2, constructing a k nearest neighbor sample set of the training samples;
step s 3: calculating prior probability and normalization frequency matrix
For each mark item yjJ is 1 to q, and the prior probability is calculated by the following formula (1)
Figure FDA0002639496290000011
And
Figure FDA0002639496290000012
Figure FDA0002639496290000013
wherein HjAnd
Figure FDA0002639496290000014
respectively representing the newly acquired unlabeled sample x with and without the label item yjTo do so
Figure FDA0002639496290000015
And
Figure FDA0002639496290000016
then respectively represent HjAnd
Figure FDA0002639496290000017
the established prior probability, s is the control parameter, q is the dimension of the mark set, m is the number of samples,
calculating a normalized frequency matrix [ f ] according to the following formulas (2) and (3)j[r]]k×qAnd [ fj[r]]k×q
Figure FDA0002639496290000019
Figure FDA00026394962900000110
Wherein the content of the first and second substances,j(xi) Represented training sample xiHas a label y in a close-neighbor sample ofjNumber of samples [. C]Denotes rounding, fj[r]Indicating the presence of a marker y in the training sample setjAnd at the same time has a duty ratio of
Figure FDA00026394962900000111
Also has a label yjThe number of training samples of (a) is,
Figure FDA00026394962900000112
it means that there is no label y in the training sample setjAnd at the same time has a duty ratio of
Figure FDA00026394962900000113
Has a label yjThe number of training samples; k is a radical ofiThe number of actual nearest neighbor samples of a sample set is defined, and r is the sample set;
step S4: constructing k neighbor sample set of unknown sample x
For unknown sample x, a k-neighbor sample set of the sample is constructed in the training sample set according to the method of step S2
Figure FDA00026394962900000115
The actual number of nearest neighbor samples is kxWherein k isx≤k;
Step S5: computing homolabeled statistics for unknown samples x
For each tag item yjJ is 1 to q, and is counted according to the formula (4)
Figure FDA00026394962900000114
Number of samples with the marker entry { C }jIs called the unknown sample x at its kxHomologous statistics in the nearest neighbor sample set:
Figure FDA0002639496290000021
step S6: calculating likelihood probability of unknown sample x
Calculating likelihood probabilities according to equations (5) and (6)
Figure FDA0002639496290000022
And
Figure FDA0002639496290000023
Figure FDA0002639496290000024
Figure FDA0002639496290000025
wherein the content of the first and second substances,
Figure FDA0002639496290000026
indicating when the unknown sample x has a label yjWhen there is a ratio in its nearest neighbor sample
Figure FDA0002639496290000027
Also having a label yjS is a control parameter;
step S7: estimating the label value of an unknown sample x
The estimated value { Y) of the marker set Y of the unknown sample x is calculated from the following equations (7) and (8)1,y2I.e. that
Figure FDA0002639496290000028
Considering the strong correlation of two indexes of the first packet delay and the page opening delay, especially the influence of the first packet delay on the page opening delay, y is estimated2That is, if the flag item of the page opening delay is established (that is, the flag value is 1), the following method is adopted to calculate:
Figure FDA0002639496290000029
wherein H1、H2Respectively, the unknown sample sets.
2. The browsing-class service awareness index prediction method based on multi-label learning according to claim 1, wherein the step 1 comprises the following steps:
step S1a, selecting attribute items of training sample set
Selecting a subset of the samples from all fields of the samples, namely { date, time, longitude, latitude, large area number, cell number, field strength, signal quality, website name, website IP, DNS IP, user identification and terminal model }, and using the selected subset as an attribute set x ═ x { x ═ of the training samples }1,x2,...,xdD is the dimension of the attribute set; the system comprises attribute fields, a server and a server, wherein the attribute fields { date, time, longitude, latitude, field intensity and signal quality } are numerical data, and the attribute fields { major district number, cell number, website name, website IP, DNS IP, user identification and terminal model } are name data;
step S1b, selecting label items of training sample set
From a sample stationThere are fields to choose their subset, i.e., { first packet delay, page open delay }, as the training sample's token set Y ═ Y1,y2,...,yqQ is the dimension of a mark set, wherein a mark field { first packet time delay, page opening time delay } is Boolean data;
step S1c, selection of training sample
Randomly selecting m samples from the browsing traffic perception sample set as a training sample set D according to the attribute set and the mark set selected in the steps 1a and 1b, namely D { (x)i,Yi)|1≤i≤m};
Step S1d, conversion of training sample attribute values and label values
If the original values of date and time in the training sample are not numeric data, a conversion is made: the date value is defined as 0 by taking a certain date as a reference, the number of days from the reference date is taken as the representation of the date value in the training sample, the time is taken as a reference point when zero is taken, the minute is taken as granularity,
all numerical data in the training samples were normalized using the following formula:
Figure FDA0002639496290000031
wherein the content of the first and second substances,
Figure FDA0002639496290000032
the true value of the attribute i is represented,
Figure FDA0002639496290000033
and
Figure FDA0002639496290000034
the minimum and maximum values of the attribute in the training sample set are identified,
for each mark field in training sample { first packet delay y1Page open delay y2All numerical data in the original browsing service perception sample set are recorded as numerical data respectively
Figure FDA0002639496290000035
According to a preset perception quality difference judgment threshold { T }1,T2The data are converted into Boolean type data according to the formula (9), namely:
Figure FDA0002639496290000036
wherein the function
Figure FDA0002639496290000037
This indicates that 1 is returned when the condition c is satisfied, and 0 is returned otherwise.
3. The method as claimed in claim 1, wherein the step S2 is performed for each sample vector x in the training sample setiI is 1 to m, at most k nearest neighbor samples of the sample vector are searched in the training sample set, and k nearest neighbor sample set of the sample vector is formed
Figure FDA0002639496290000038
The number of the actual nearest neighbor samples of the sample set is kiAnd k isiK is not more than k; the specific method comprises the following steps:
step S2a, sample vector xi={xi1And l is 1-d), all and x are searched in the training sample set according to the date attributei1Samples whose distance is less than a set threshold Td (default value is 10) constitute an initial nearest neighbor sample set
Figure FDA0002639496290000039
Step S2b, in the initial nearest neighbor sample set
Figure FDA00026394962900000310
In (3), a sample is sought that satisfies at least one of the following conditions, namely, xi5Is identical to, or is based onSum of latitude and longitude xiThe longitude and latitude Euclidean distance of the intermediate nearest neighbor sample set is smaller than a set threshold Tdis
Figure FDA00026394962900000311
Step S2c, calculating a middle nearest neighbor sample set
Figure FDA0002639496290000041
Medium sample vector and sample vector xiThe weighted Euclidean distances are arranged according to ascending order of distance values, and the maximum first k samples are taken as a k nearest neighbor sample set
Figure FDA0002639496290000042
CN201710493097.8A 2017-06-26 2017-06-26 Browsing service perception index prediction method based on multi-label learning Active CN107292519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710493097.8A CN107292519B (en) 2017-06-26 2017-06-26 Browsing service perception index prediction method based on multi-label learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710493097.8A CN107292519B (en) 2017-06-26 2017-06-26 Browsing service perception index prediction method based on multi-label learning

Publications (2)

Publication Number Publication Date
CN107292519A CN107292519A (en) 2017-10-24
CN107292519B true CN107292519B (en) 2020-11-03

Family

ID=60098311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710493097.8A Active CN107292519B (en) 2017-06-26 2017-06-26 Browsing service perception index prediction method based on multi-label learning

Country Status (1)

Country Link
CN (1) CN107292519B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133387B (en) * 2017-12-21 2021-11-12 北京联合大学 Multi-label K nearest neighbor algorithm based on soft information
CN110049129A (en) * 2019-04-20 2019-07-23 北京联合大学 A kind of mobile Internet business qualitative forecasting method based on feature selecting
CN112671573B (en) * 2020-12-17 2023-05-16 北京神州泰岳软件股份有限公司 Method and device for identifying potential off-network users in broadband service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008111027A3 (en) * 2007-03-13 2008-12-11 Alcatel Lucent Quality of service admission control network
CN102647280A (en) * 2012-02-28 2012-08-22 山东大学 Embedded Internet connection device for community information system and realization method of embedded Internet connection device
CN104899596A (en) * 2015-03-16 2015-09-09 景德镇陶瓷学院 Multi-label classification method and apparatus thereof
CN105656692A (en) * 2016-03-14 2016-06-08 南京邮电大学 Multi-instance multi-label learning based area monitoring method used in wireless sensor network
CN105791034A (en) * 2016-05-15 2016-07-20 北京联合大学 Browse type service perception analysis method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008111027A3 (en) * 2007-03-13 2008-12-11 Alcatel Lucent Quality of service admission control network
CN102647280A (en) * 2012-02-28 2012-08-22 山东大学 Embedded Internet connection device for community information system and realization method of embedded Internet connection device
CN104899596A (en) * 2015-03-16 2015-09-09 景德镇陶瓷学院 Multi-label classification method and apparatus thereof
CN105656692A (en) * 2016-03-14 2016-06-08 南京邮电大学 Multi-instance multi-label learning based area monitoring method used in wireless sensor network
CN105791034A (en) * 2016-05-15 2016-07-20 北京联合大学 Browse type service perception analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《多标签分类法在电能质量复合扰动分类中的应用》;周雒维,管春,卢伟国;《中国电机工程学报》;20110205;第31卷(第4期);全文 *

Also Published As

Publication number Publication date
CN107292519A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN107292519B (en) Browsing service perception index prediction method based on multi-label learning
WO2023045829A1 (en) Service abnormality prediction method and device, storage medium, and electronic device
US9986366B2 (en) Controlling data collection interval of M2M device
CN109391513B (en) Network perception intelligent early warning and improving method based on big data
CN111294819B (en) Network optimization method and device
CN105791034A (en) Browse type service perception analysis method
CN102711162A (en) Method for monitoring network quality and optimizing user experience in mobile internet
CN105657738A (en) Method, device and system for positioning problem of poor mobile phone service aware quality
CN111405585B (en) Neighbor relation prediction method based on convolutional neural network
CN109302714A (en) Realize that base station location is studied and judged and area covered knows method for distinguishing based on user data
US20230217308A1 (en) Traffic flow prediction in a wireless network using heavy-hitter encoding and machine learning
CN114118748B (en) Service quality prediction method and device, electronic equipment and storage medium
CN102158886A (en) Method and device for maintaining network running through user-sensing information
CN110049129A (en) A kind of mobile Internet business qualitative forecasting method based on feature selecting
Zhohov et al. One step further: Tunable and explainable throughput prediction based on large-scale commercial networks
CN107371183A (en) A kind of output intent and device of network quality report
JP2014107825A (en) Communication path identification device
JP6167052B2 (en) Communication traffic prediction apparatus and program
CN110120883B (en) Method and device for evaluating network performance and computer readable storage medium
Zhang et al. Cellular QoE prediction for video service based on causal structure learning
JP2017208717A (en) Analysis system for radio communication network
WO2016065759A1 (en) Method and apparatus for optimizing neighbour cell list
WO2024057063A1 (en) Operational anomaly detection and isolation in multi-domain communication networks
Soos et al. Analyzing group behavior patterns in a cellular mobile network for 5G use‐cases
TWI724784B (en) Method for focusing on problem area of mobile user

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant