WO2022001480A1 - 热门应用识别方法、网络系统、网络设备及存储介质 - Google Patents

热门应用识别方法、网络系统、网络设备及存储介质 Download PDF

Info

Publication number
WO2022001480A1
WO2022001480A1 PCT/CN2021/095422 CN2021095422W WO2022001480A1 WO 2022001480 A1 WO2022001480 A1 WO 2022001480A1 CN 2021095422 W CN2021095422 W CN 2021095422W WO 2022001480 A1 WO2022001480 A1 WO 2022001480A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
application
target
popular
stream data
Prior art date
Application number
PCT/CN2021/095422
Other languages
English (en)
French (fr)
Inventor
江舟
连超
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022001480A1 publication Critical patent/WO2022001480A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a method for identifying a popular application, a network system, a network device and a storage medium.
  • Traffic management is a series of concepts, strategies and actions based on intelligent pipeline (physical network) and aggregation platform (commercial network), with the business direction of expanding traffic scale, improving traffic level, and enriching traffic connotation, and aiming to release traffic value. collection.
  • An important basic data for traffic management analysis is to obtain the popularity of the application and use it to formulate corresponding marketing strategies for applications with different popularity, which helps to improve revenue and competitiveness.
  • DPI Deep Packet Inspection, deep packet inspection
  • DPI Deep Packet Inspection
  • Using DPI to identify the application type needs to rely on the signature database.
  • the main way to update the signature database is to manually discover and add new application data by the system maintainer. Since the number of applications is increasing greatly every day, the update speed of the signature database cannot be completely matched with that of the signature database. The growth rate of the number of applications matches the growth rate of the number of applications. Therefore, when the number of applications grows rapidly, it is impossible to obtain the corresponding application type in time only by manually discovering and adding new application data by the maintainer, and it is impossible to conduct statistics on application popularity. Popularity statistics become lagging, which is not conducive to timely discovery of new popular applications.
  • the embodiments of the present application provide a method for identifying a popular application, a network system, a network device, and a storage medium.
  • an embodiment of the present application provides a method for identifying popular applications, the method comprising: acquiring merged stream data from an Internet Protocol Information Record IPDR, wherein the merged stream data includes user subscription data and user session data; Filter out the target flow data corresponding to the same application from the combined flow data, and determine the target application according to the target flow data; obtain the flow change trend of the target application, and according to the flow change trend as an increasing trend, calculate the The target application is identified as a popular application.
  • IPDR Internet Protocol Information Record
  • an embodiment of the present application further provides a method for identifying popular applications, the method comprising: acquiring user subscription data and user session data; combining the user subscription data and the user session data to obtain combined stream data; Send the merged stream data to NWDAF, so that after the NWDAF selects the target stream data corresponding to the same application from the merged stream data, the target application is determined according to the target stream data, and the target application is obtained.
  • the traffic change trend is identified as a popular application according to the traffic change trend as an increasing trend.
  • an embodiment of the present application further provides a network system, including an NWDAF and an IPDR, wherein: the IPDR is configured to obtain user subscription data and user session data and to obtain the user subscription data and the user session data Merge to obtain merged flow data; the NWDAF connects to the IPDR to obtain the merged flow data from the IPDR, and selects the target flow data corresponding to the same application from the merged flow data according to the target flow A target application is determined from the data, and a traffic change trend of the target application is obtained, and the target application is identified as a popular application according to the traffic change trend being an increasing trend.
  • the IPDR is configured to obtain user subscription data and user session data and to obtain the user subscription data and the user session data Merge to obtain merged flow data
  • the NWDAF connects to the IPDR to obtain the merged flow data from the IPDR, and selects the target flow data corresponding to the same application from the merged flow data according to the target flow
  • a target application is determined from the data, and a traffic change trend of the
  • an embodiment of the present application further provides a network device, including the network system described in the third aspect; or, including at least one processor and a memory configured to be communicatively connected to the at least one processor;
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to execute the hot-button as described in the first aspect or the second aspect Apply the identification method.
  • embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are configured to cause a computer to execute the first aspect or the first aspect.
  • the popular application identification method described in the second aspect is not limited to:
  • FIG. 1 is a schematic diagram of a 5G network architecture in some cases provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a 5G network architecture provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for identifying popular applications on the IPDR side provided by an embodiment of the present application
  • FIG. 4 is a flowchart of the specific steps of merging user subscription data and user session data to obtain merged stream data provided by an embodiment of the present application;
  • FIG. 5 is a flowchart of a method for identifying popular applications on the NWDAF side provided by an embodiment of the present application
  • FIG. 6 is a flowchart of specific steps for filtering out target stream data corresponding to the same application from merged stream data provided by an embodiment of the present application;
  • Fig. 8 is the flow change trend of obtaining the target application provided by the embodiment of the present application, and according to the flow change trend as an increasing trend, the specific step flow chart of identifying the target application as a popular application;
  • FIG. 9 is a flowchart of a specific example of a popular application identification method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • multiple means more than two, greater than, less than, exceeding, etc. are understood as not including this number, above, below, within, etc. are understood as including this number. If there is a description of "first”, “second”, etc., it is only for the purpose of distinguishing technical features, and cannot be understood as indicating or implying relative importance, or implicitly indicating the number of indicated technical features or implicitly indicating the indicated The sequence of technical characteristics.
  • NWDAF Network Data Analytics Function
  • function OAM (Operation Administration and Maintenance, network management system) or AF (Application Function, application function) to collect specific data, obtain certain analysis results through certain big data analysis network elements, and send the results to Specific NF, OAM or AF, NWDAF is a very critical network function to realize network intelligence.
  • IPDR Internet Protocol Detail Record, IP information record
  • IP information record IP traceability system
  • FIG. 1 it is a schematic diagram of an existing 5G network architecture.
  • FIG. 2 it is a schematic diagram of the architecture of applying the network system provided by the embodiment of the present application to the 5G network, wherein the network system provided by the embodiment of the present application includes NWDAF and IPDR, Based on the popular application identification method provided by the present application, the embodiment of the present application, on the basis of the original 5G network architecture, increases the capability of receiving UDM user data subscription information in IPDR, and increases the capability of receiving aggregated data of external network elements in NWDAF, that is, receiving For the data flow from IPDR, NWDAF and IPDR can be set on different network devices respectively, or NWDAF and IPDR can also be set on the same network device at the same time.
  • NWDAF and IPDR can be set on different network devices respectively, or NWDAF and IPDR can also be set on the same network device at the same time.
  • the embodiments of the present application provide a method, network system, network device and storage medium for identifying popular applications, which can improve the timeliness of identifying popular applications.
  • an embodiment of the present application provides a method for identifying popular applications, which is applied to IPDR, including but not limited to the following steps 301 to 303:
  • Step 301 Acquire user subscription data and user session data
  • user subscription data can be obtained from UDM (Unified Data Manager, unified data management), wherein, UDM is the permanent storage location of user subscription data, located in the home network where the user contracts. .
  • UDM Unified Data Manager, unified data management
  • the user subscription data may include, but is not limited to, IMSI, age, gender, region, occupation, and the like.
  • user session data can be obtained from SMF (Session Management Function, session management function), where the SMF is a functional unit of the 5G service-based architecture.
  • SMF Session Management Function
  • SMF Session Management Function
  • PDU Protocol Data Unit
  • UPF User Plane Function, user plane function
  • the user session data may include but not limited to SEID (Session Endpoint Identifier, session endpoint identifier), N4IP address, private network address, private network port, protocol type, destination IP address, destination port, number of packets, duration , known application types, URLs, IMSIs, etc.
  • SEID Session Endpoint Identifier, session endpoint identifier
  • N4IP address private network address, private network port, protocol type, destination IP address, destination port, number of packets, duration , known application types, URLs, IMSIs, etc.
  • Step 302 combine user subscription data and user session data to obtain combined stream data
  • combining user subscription data and user session data to obtain combined stream data can facilitate data acquisition by NWDAF, and combining user subscription data and user session data into combined stream data can make the data stream acquired by NWDAF more efficient. Clear and easy for NWDAF to perform subsequent data analysis.
  • Step 303 Send the merged flow data to the NWDAF, so that the NWDAF can filter out the target flow data corresponding to the same application from the merged flow data, and then determine the target application according to the target flow data, and obtain the traffic change trend of the target application and determine the target application according to the traffic change.
  • the trend is a growing trend, and the target application is identified as a popular application.
  • NWDAF can filter the target flow data corresponding to the same application from the merged flow data to determine the target application, and then obtain the traffic change trend of the target application, and popularize the target application according to the traffic change trend.
  • Application identification enables the identification of popular applications without relying on the feature library, avoiding the lag problem caused by manual maintenance, thereby improving the timeliness of identification of popular applications and discovering new popular applications in time.
  • the user subscription data and the user session data are combined to obtain combined stream data, which may specifically include the following steps 401 to 402:
  • Step 401 merge the user session data obtained from the SMF and the UPF through the session endpoint identification SEID or N4IP address;
  • both the user session data provided by the SMF and the UPF include SEID and N4IP
  • data streams can be merged through SEID and N4IP, for example, a merge key can be used to merge data streams.
  • Step 402 Combine the combined user session data with the user subscription data through the International Mobile Subscriber Identity Code IMSI to obtain a combined data stream.
  • the combined user session data of SMF and UPF and user subscription data of UDM include IMSI
  • the combined user session data and user subscription data can be further combined through IMSI to obtain a combined stream.
  • the final merged flow data can include IMSI, age, gender, region, occupation, SEID, N4IP address, private network address, private network port, protocol type, destination IP address, destination port, number of packets, duration, known application type, URL, etc.
  • an embodiment of the present application also provides a method for identifying popular applications, which is applied to NWDAF, including but not limited to the following steps 501 to 503:
  • Step 501 Obtain merged stream data from IPDR, wherein the merged stream data includes user subscription data and user session data;
  • step 501 the user subscription data and user session data have been explained correspondingly in the previous embodiments, and will not be repeated here, how to combine the user subscription data and user session data into combined stream data is also described in the previous Corresponding explanations are made in the embodiments, and details are not repeated here.
  • Step 502 Screen out target flow data corresponding to the same application from the merged flow data, and determine the target application according to the target flow data;
  • the target flow data corresponding to the same application is filtered out from the merged flow data, and can be screened according to the destination IP address, for example, the data flow with the same destination IP address can be considered as the target flow data corresponding to the same application, Then the application corresponding to the destination IP address can be used as the target application.
  • the combined flow data may include flow data of multiple applications, and the flow data of each application includes corresponding user subscription data and user session data.
  • the destination IP addresses corresponding to the flow data of different applications are generally different.
  • the specific structure of the merged stream data can be:
  • First stream data first user subscription data and first user session data, wherein the destination IP address in the first user session data is IP1;
  • Second stream data second user subscription data and second user session data, wherein the destination IP address in the second user session data is IP1;
  • Third stream data third user subscription data and third user session data, wherein the destination IP address in the third user session data is IP2;
  • Fourth stream data fourth user subscription data and fourth user session data, wherein the destination IP address in the fourth user session data is IP2;
  • Fifth stream data fifth user subscription data and fifth user session data, wherein the destination IP address in the fifth user session data is IP3;
  • the target flow data corresponding to the same application is filtered out from the merged flow data, which can be filtered according to the destination IP address, that is, the above-mentioned first flow data and second flow data are the target flow data corresponding to the first application, so the first flow data and the second flow data are the target flow data corresponding to the first application.
  • An application can be used as a target application; the above-mentioned third stream data and fourth stream data are target stream data corresponding to the second application, so the second application can be used as another target application.
  • the above is only a schematic illustration to explain how to filter out the target stream data corresponding to the same application from the merged stream data.
  • the actual total data volume of the merged stream data may be large, corresponding to the target of the same application.
  • the amount of streaming data may also be large.
  • Step 503 Obtain the traffic change trend of the target application, and identify the target application as a popular application according to the traffic change trend as an increasing trend.
  • popular application identification is performed on the target application according to the change trend of the traffic, so that the identification of popular applications does not need to rely on the feature library, and the lag problem caused by manual maintenance is avoided, thereby improving the timeliness of identification of popular applications, and discovering new applications in time. popular applications.
  • the target application when the traffic change trend of the target application is an increasing trend, that is, the traffic volume of the target application becomes more and more, it means that the target application is a new popular application.
  • the traffic change trend of the target application may be obtained based on the total traffic accessing the target application, the total number of links, or the total number of users of the target application.
  • the flow change trend can be obtained by using a trend prediction algorithm.
  • a trend prediction algorithm This embodiment of the present application does not limit the specific algorithm implementation, for example, an autoregressive differential moving average model ARIMA is used.
  • the target flow data corresponding to the same application is filtered out from the merged flow data, which may specifically include the following steps 601 to 602:
  • Step 601 Perform user group division on the merged stream data
  • the merged stream data includes a variety of stream data, covering multiple user groups.
  • the user group division is performed on the merged stream data, which can be divided by a single dimension, for example, using age as the Dimension, the user group can be divided into young people, middle-aged people, the elderly, etc.; another example, with occupation as the dimension, the user group can be divided into doctors, students, white-collar workers, etc.
  • age the dimension
  • occupation the dimension
  • the user group can be divided into doctors, students, white-collar workers, etc.
  • the user group may also be divided according to other dimensions.
  • the combined flow data is divided into user groups, and a clustering algorithm can also be used to directly calculate the characteristic behavior distance of the user group, wherein the characteristics of the user group can be divided into It is an ordered index and an unordered index.
  • Minkowski distance can be used to calculate the characteristic behavior distance, specifically:
  • p represents the power, which can be 1, 2, 3, etc., representing the first-order distance, the second-order distance, the third-order distance, etc.
  • VDM Value Difference Metric
  • VDM distance between the two discrete values a and b on the attribute u (that is, the characteristic behavior distance that needs to be calculated) is:
  • the data with the closest characteristic behavior distance belong to the same user group, and iterative calculation is continuously performed through the above method, and finally the user group division of the combined stream data can be completed.
  • Step 602 Screen out the flow data generated by the same user group accessing the same application to obtain the target flow data.
  • the specific structure of the merged stream data can be:
  • First stream data first user subscription data and first user session data, wherein the destination IP address in the first user session data is IP1, corresponding to the first user group;
  • Second stream data second user subscription data and second user session data, wherein the destination IP address in the second user session data is IP1, corresponding to the first user group;
  • the third stream data the third user subscription data and the third user session data, wherein, the destination IP address in the third user session data is IP2, corresponding to the first user group;
  • Fourth stream data fourth user subscription data and fourth user session data, wherein the destination IP address in the fourth user session data is IP2, corresponding to the second user group;
  • Fifth stream data fifth user subscription data and fifth user session data, wherein the destination IP address in the fifth user session data is IP3, corresponding to the second user group;
  • the first stream data and the second stream data are stream data generated by accessing the same application by the same user group.
  • the accuracy of filtering out the target flow data corresponding to the same application can be improved, thereby improving the accuracy and reliability of subsequent popular application identification.
  • Streaming data generated by an application may specifically include the following steps 701 to 702:
  • Step 701 Identify the application type of the application according to the access behavior data of the same user group to the same application;
  • the application type of this application can be estimated.
  • the access behavior data may include, but is not limited to, access time, access duration, access traffic distribution, access traffic size, source port, and the like.
  • the access time when the access time is all at night, it means that the user group may be students, and it may be a game application or a video application; when the access time is long, the application may be a game application; when the access traffic is distributed evenly, but the access When the traffic is very large, it may be a video application; when the access traffic is distributed evenly, but the access traffic is small, it may be a chat application or a game application; in addition, if the source port of the access is fixed, it can also be accessed through the source port.
  • the application type To sum up, if there is a potential inherent law in the access behavior data of the same user group to the same application, the application can be considered as the target application identified by the popular application.
  • the access behavior data includes data of multiple dimensions.
  • simple variance or statistical distribution of a single dimension can be used to determine whether the distribution of data characteristics of a single dimension is fixed.
  • the relationship between multiple dimensions can be judged by feature identification such as covariance matrix or divergence matrix or PCA. If multiple feature dimensions are related, the application can be identified through multiple feature dimensions.
  • Application Type For example, game applications have a longer access time, less access traffic, and a large number of access links; the access time distribution of short videos is intermittent, and the access traffic size also shows an interval peak, and the number of access links is small. .
  • Step 702 Filter out the flow data corresponding to the application according to the application type.
  • the flow data corresponding to the application is filtered out as the target flow data, and whether the corresponding application is necessary as the target application can be judged according to the application type, thereby further improving the accuracy and reliability of subsequent popular application identification.
  • the merged flow data may be quantified first, for example, the occupation field may be filled with a numerical number, and the area field may be filled with the latitude and longitude. , which is convenient for subsequent analysis of the merged stream data. It should be supplemented that the above quantification method is only exemplary, and an appropriate quantification method can be selected according to the actual application.
  • feature analysis may also be performed on the merged flow data to obtain the weight of each dimension in the merged flow data, and the exclusion weight is less than the first prediction weight.
  • the data corresponding to the dimension of the threshold Exemplarily, an algorithm such as PCA (Principal Component Analysis, principal component analysis) can be used to identify the main dimensions to reduce the dimensionality of the merged stream data, and eliminate the data corresponding to the unimportant dimensions. For example, in the merged stream data, age and occupation have the largest weight, and region has a smaller weight, so the regional dimension data can be ignored.
  • PCA Principal Component Analysis, principal component analysis
  • the above-mentioned first preset threshold may be 2%, 5%, 10%, etc., which can be freely set according to actual conditions.
  • the first preset threshold is 5%
  • the weight of the region in the combined stream data If it is only 3%, the data of the dimension of region will be excluded.
  • known data before filtering out the stream data generated by the same user group accessing the same application, known data may be obtained from a preset feature database, wherein the known data is based on the uniform resource locator URL, purpose At least one of the port, the protocol type, and the application type is obtained, and after the known data is obtained, the flow data corresponding to the known data is eliminated from the combined flow data.
  • known data judgment methods are a few common known data judgment methods:
  • the URL field is a record of a specific well-known website. These records are WEB services. It can be judged through the URL whether the corresponding stream data is generated by accessing a known application;
  • the destination port is a record of well-known HTTP WEB service ports such as 80 and 443. It can be judged by the destination port whether the corresponding stream data is generated by accessing a known application;
  • the specific protocol is the record of well-known protocols such as DNS, SSH, FTP, etc. It can be judged by the specific protocol whether the corresponding stream data is generated by accessing a known application;
  • the known application type will have a corresponding identification field, and it can be judged according to the identification field whether the corresponding stream data is generated by accessing a known application;
  • the traffic change trend of the target application is obtained, and the target application is identified as a popular application according to the traffic change trend as an increasing trend, which specifically includes the following steps 801 to 802:
  • Step 801 Obtain the traffic of the target application in the first preset duration, and predict the traffic change trend of the target application according to the traffic of the target application in the first preset duration;
  • the traffic of the target application in the first preset duration is obtained, and the traffic of the target application can be monitored according to a certain frequency.
  • the first preset duration can be 3 days, and the daily traffic of the target application can be obtained. traffic.
  • a trend prediction algorithm is used to obtain the traffic change trend of the target application within a second preset time period, for example, the second preset time period may be 7 days.
  • the specific prediction algorithm can use the autoregressive differential moving average model ARIMA and so on. By predicting the traffic change trend of the target application, the time required for identifying popular applications can be shortened, and the efficiency of identifying popular applications can be improved.
  • the first preset duration and the second preset duration can be freely set according to the actual situation, for example, the first preset duration can also be in hours, the first preset duration can be 3 hours, and the second preset duration The duration can be 24 hours.
  • Step 802 When the traffic change trend is an increasing trend within the second preset time period and the growth rate exceeds the second preset threshold, identify the target application as a popular application.
  • the accuracy can be improved.
  • the second preset threshold may be 30%, 40%, 50%, and so on. It can be understood that, the second preset threshold can also be freely set according to the actual situation.
  • the target flow data is sent to the user plane function UPF.
  • the UPF can automatically Capture flow data, and automatically export it to OAM (Operation Administration and Maintenance, operation maintenance management).
  • OAM can perform detailed application analysis based on the flow data sent by UPF, import the application signature database, and send it to UPF, followed by IPRD
  • the stream data of the application is received again, it is a known application, and there is no need to identify popular applications, thus forming a closed loop.
  • Step 901 NWDAF subscribes stream data of unknown applications from IPDR;
  • Step 902 UPF subscribes streaming data of popular unknown applications from NWDAF;
  • Step 903 IPDR subscribes user subscription data from UDM
  • Step 904 the user equipment UE goes online, and the SMF creates a PDU session
  • Step 905 IPDR collects control plane creation data from SMF
  • Step 906 the UE uses the network service, and after the service session ends, the stream data is released;
  • Step 907 the IPDR collects the user session data of this segment of stream data from the UPF;
  • Step 908 IPDR merges the data of SMF, UPF, UDM according to fields such as SEID, N4IP, IMSI, etc., to obtain corresponding merged stream data;
  • Step 909 IPDR pushes the stream data of the unknown application in the merged stream data to the NWDAF;
  • Step 910 NWDAF continues to receive IPDR data, analyzes the combined flow data corresponding to the same unknown application, and obtains the traffic change trend of the unknown application, and when the traffic change trend exceeds the second preset threshold, identifies the unknown application as a popular application.
  • Unknown application
  • Step 911 NWDAF sends the corresponding merged stream data to UPF;
  • Step 912 UPF receives the merged stream data corresponding to this popular unknown application, and starts to automatically capture the stream data that is subsequently generated by this popular unknown application;
  • Step 913 UPF pushes the stream data subsequently generated by the popular unknown application to OAM;
  • Step 914 The OAM analyzes the unknown application through the stream data of the popular unknown application, imports the application signature database, and delivers it to the UPF.
  • the NWDAF can also subscribe the stream data of all applications from the IPDR, and then process the data by the known data elimination method in the previous embodiment to obtain the stream data of the unknown application.
  • IPDR sends the merged flow data of the unknown application to NWDAF, and NWDAF then filters out the target flow data corresponding to the same unknown application from the merged flow data, and then obtains the traffic change trend of the unknown application, and conducts analysis on the unknown application according to the traffic change trend.
  • Popular application identification makes the identification of popular applications do not need to rely on the signature database, avoids the lag problem caused by manual maintenance, thereby improving the timeliness of popular application identification, and discovering popular unknown applications in time; and then merge streams corresponding to the popular unknown applications.
  • the data is sent to UPF, and UPF starts to automatically capture the stream data of the popular unknown application and push it to OAM.
  • OAM analyzes the popular unknown application, imports the application signature database, and sends it to UPF, making the popular unknown application a known application. form a closed loop.
  • an embodiment of the present application also provides a network system, including NWDAF and IPDR, wherein IPDR is set to acquire user subscription data and user session data and combine user subscription data and user session data to obtain combined stream data; NWDAF connects IPDR , to obtain merged flow data from IPDR, filter out the target flow data corresponding to the same application from the merged flow data, and then determine the target application according to the target flow data, obtain the traffic change trend of the target application, and analyze the target application according to the flow change trend.
  • NWDAF connects IPDR , to obtain merged flow data from IPDR, filter out the target flow data corresponding to the same application from the merged flow data, and then determine the target application according to the target flow data, obtain the traffic change trend of the target application, and analyze the target application according to the flow change trend.
  • Do popular app identification Do popular app identification.
  • the above-mentioned network system and the above-mentioned method for identifying popular applications are based on the same inventive concept, so that new popular applications can be discovered in time
  • an embodiment of the present application also provides a network device, including the above-mentioned network system.
  • the above-mentioned network device and the above-mentioned method for identifying popular applications are based on the same inventive concept, so that new popular applications can be discovered in time.
  • the network device may set only one of NWDAF and IPDR, or set both NWDAF and IPDR.
  • FIG. 10 shows a network device 1000 provided by an embodiment of the present application.
  • the network device 1000 includes: a memory 1001, a processor 1002, and a computer program stored in the memory 1001 and executable on the processor 1002, and the computer program is configured to execute the above-mentioned popular application identification method when running.
  • the processor 1002 and the memory 1001 may be connected by a bus or other means.
  • the memory 1001 can be configured to store non-transitory software programs and non-transitory computer-executable programs, such as the method for identifying popular applications described in the embodiments of the present application.
  • the processor 1002 executes the non-transitory software programs and instructions stored in the memory 1001 to implement the above-mentioned method for identifying popular applications.
  • the memory 1001 can include a stored program area and a stored data area, wherein the stored program area can store an operating system and an application program required by at least one function; the stored data area can store and execute the above-mentioned popular application identification method. Additionally, memory 1001 may include high-speed random access memory 1001, and may also include non-transitory memory 1001, such as at least one piece of disk memory 1001, flash memory device, or other piece of non-transitory solid-state memory 1001. In some embodiments, memory 1001 may include memory 1001 located remotely from processor 1002, and these remote memories 1001 may be connected to the network device 1000 through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the non-transitory software programs and instructions required to realize the above-mentioned popular application identification method are stored in the memory 1001, and when executed by one or more processors 1002, the above-mentioned popular application identification method is executed, for example, execute the method in FIG. 3 .
  • Steps 901 to 914 of the method in FIG. 9 are stored in the memory 1001, and when executed by one or more processors 1002, the above-mentioned popular application identification method is executed, for example, execute the method in FIG. 3 .
  • Embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions, where the computer-executable instructions are configured to execute the above-mentioned method for identifying popular applications.
  • the computer-readable storage medium stores computer-executable instructions that are executed by one or more control processors 1002, for example, by one of the processors 1002 in the network device 1000 described above.
  • the above-mentioned one or more processors 1002 can be made to execute the above-mentioned popular application identification method, for example, to execute method steps 301 to 303 in FIG. 3 , method steps 401 to 402 in FIG. 4 , method steps 501 to 503 in FIG.
  • Method steps 601 to 602 in FIG. 6 method steps 701 to 702 in FIG. 7 , method steps 801 to 802 in FIG. 8 , and method steps 901 to 914 in FIG. 9 .
  • the embodiments of the present application include: acquiring merged stream data from the Internet Protocol Information Record IPDR, wherein the merged stream data includes user subscription data and user session data; filtering out the merged stream data corresponding to the same application Target flow data, determine the target application according to the target flow data; obtain the flow change trend of the target application, and identify the target application as a popular application according to the flow change trend as an increasing trend, and obtain the data collected by IPDR.
  • Merge the flow data, and filter out the target flow data corresponding to the same application from the merged flow data to determine the target application then obtain the traffic change trend of the target application, and perform popular application identification on the target application according to the flow change trend, so that The identification of popular applications does not need to rely on the feature library, avoiding the lag problem caused by manual maintenance, thereby improving the timeliness of identifying popular applications and discovering new popular applications in time.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory 1001 technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or Any other medium that can be configured to store the desired information and that can be accessed by a computer.
  • communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种热门应用识别方法、网络系统、网络设备及存储介质。其中,热门应用识别方法包括:获取来自IPDR的合并流数据,从合并流数据中筛选出对应同一个应用的目标流数据,根据目标流数据确定目标应用;获取目标应用的流量变化趋势,根据流量变化趋势对目标应用进行热门应用识别,获取由IPDR采集的合并流数据,并从合并流数据中筛选出对应同一个应用的目标流数据进而确定目标应用,再获取该目标应用的流量变化趋势,根据流量变化趋势对目标应用进行热门应用识别。

Description

热门应用识别方法、网络系统、网络设备及存储介质
相关申请的交叉引用
本申请基于申请号为202010599481.8、申请日为2020年06月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及通信技术领域,特别是涉及一种热门应用识别方法、网络系统、网络设备及存储介质。
背景技术
随着移动互联网的发展,流量经营逐渐演变成一个重要经营方向。流量经营是以智能管道(物理网络)和聚合平台(商业网络)为基础,以扩大流量规模、提升流量层次、丰富流量内涵为经营方向,以释放流量价值为目的的一系列理念、策略和行动的集合。流量经营分析一个重要的基础数据就是要获得应用的热度,并用以对不同热度的应用制定对应的营销策略,有助于提升营收和竞争能力。
目前,热门应用识别需要基于对应用热度进行统计,而对应用热度进行统计主要靠DPI(Deep Packet Inspection,深度报文检测)完成,利用DPI进行应用热度统计需要先基于DPI的应用类型识别结果,而利用DPI识别应用类型需要依赖特征库,目前特征库的更新主要方式是由系统维护人员人工发现并添加新的应用数据,由于应用的数量每天都在大量增长,特征库的更新速度无法完全与应用数量的增长速度相匹配,因此在应用数量增长较快的情形下,仅由维护人员人工发现并添加新的应用数据无法及时获得对应的应用类型,也就无法进行应用热度的统计,使得应用热度统计变得滞后,不利于及时发现新的热门应用。
发明内容
以下是对本文详细描述的主题的概述,本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种热门应用识别方法、网络系统、网络设备及存储介质。
第一方面,本申请实施例提供了一种热门应用识别方法,所述方法包括:获取来自网际协议信息记录IPDR的合并流数据,其中,所述合并流数据包括用户签约数据和用户会话数据;从所述合并流数据中筛选出对应同一个应用的目标流数据,根据所述目标流数据确定目标应用;获取所述目标应用的流量变化趋势,根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
第二方面,本申请实施例还提供了一种热门应用识别方法,所述方法包括:获取用户签约 数据和用户会话数据;将所述用户签约数据和所述用户会话数据合并得到合并流数据;将所述合并流数据发送至NWDAF,以供所述NWDAF从所述合并流数据中筛选出对应同一个应用的目标流数据后根据所述目标流数据确定目标应用,以及获取所述目标应用的流量变化趋势并根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
第三方面,本申请实施例还提供了一种网络系统,包括NWDAF和IPDR,其中:所述IPDR被设置为获取用户签约数据和用户会话数据并将所述用户签约数据和所述用户会话数据合并得到合并流数据;所述NWDAF连接所述IPDR,以从所述IPDR获取所述合并流数据,并从所述合并流数据中筛选出对应同一个应用的目标流数据后根据所述目标流数据确定目标应用,以及获取所述目标应用的流量变化趋势并根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
第四方面,本申请实施例还提供了一种网络设备,包括第三方面所述的网络系统;或者,包括至少一个处理器和被设置为与所述至少一个处理器通信连接的存储器;所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如第一方面或者第二方面所述的热门应用识别方法。
第五方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被设置为使计算机执行第一方面或者第二方面所述的热门应用识别方法。
本申请的其他特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请实施例提供的在一些情形下5G网络架构示意图;
图2是本申请实施例提供的5G网络架构示意图;
图3是本申请实施例提供的IPDR侧的热门应用识别方法的流程图;
图4是本申请实施例提供的将用户签约数据和用户会话数据合并得到合并流数据的具体步骤流程图;
图5是本申请实施例提供的NWDAF侧的热门应用识别方法的流程图;
图6是本申请实施例提供的从合并流数据中筛选出对应同一个应用的目标流数据的具体步骤流程图;
图7是本申请实施例提供的筛选出被相同用户群体访问同一个应用所产生的流数据的具体步骤流程图;
图8是本申请实施例提供的获取目标应用的流量变化趋势,根据流量变化趋势为增长趋势,将目标应用识别为热门应用的具体步骤流程图;
图9是本申请实施例提供的热门应用识别方法的具体例子的流程图;
图10是本申请实施例提供的一种网络设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
应了解,在本申请实施例的描述中,多个(或多项)的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到“第一”、“第二”等只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。
在3GPP R16的eNA(Study of Enablers for Network Automation for 5G)课题中,提出了NWDAF(Network Data Analytics Function,网络数据分析功能)的概念,具体来说,NWDAF可以从特定的NF(Network Function,网络功能)、OAM(Operation Administration and Maintenance,网管系统)或AF(Application Function,应用功能)收集特定的数据,经过一定的大数据分析网元得出一定的分析结果,并把该结果按需发送给特定的NF、OAM或AF,NWDAF是实现网络智能化非常关键的网络功能。IPDR(Internet Protocol Detail Record,IP信息记录)是IP溯源系统,被设置为进行IP溯源。
参照图1,为现有的5G网络架构示意图,参照图2,为应用本申请实施例提供的网络系统应用在5G网络的架构示意图,其中,本申请实施例提供的网络系统包括NWDAF和IPDR,基于本申请提供的热门应用识别方法,本申请实施例在原有的5G网络架构的基础上,在IPDR增加接收UDM用户数据签约信息的能力,在NWDAF增加接收外部网元汇聚数据的能力,即接收来自IPDR的数据流,其中,NWDAF和IPDR可以分别设置于不同的网络设备上,或者,NWDAF和IPDR也可以同时设置在同一台网络设备中。
基于图2所示的网络架构,本申请实施例提供一种热门应用识别方法、网络系统、网络设备及存储介质,能够提高热门应用识别的时效性。
参照图3,本申请实施例提供了一种热门应用识别方法,应用于IPDR,包括但不限于以下步骤301至步骤303:
步骤301:获取用户签约数据和用户会话数据;
在一实施例中,基于图2所示的网络架构,用户签约数据可以从UDM(Unified Data Manager,统一数据管理)获取,其中,UDM为用户签约数据的永久存放地点,位于用户签约的归属网。
在一实施例中,用户签约数据可以包括但不限于IMSI、年龄、性别、地域、职业等。
在一实施例中,基于图2所示的网络架构,用户会话数据可以从SMF(Session Management Function,会话管理功能)获取,其中,SMF是5G基于服务架构的一个功能单元。SMF主要负责与分离的数据面交互,创建、更新和删除PDU(Protocol Data Unit,协议数据单元)会话,并管理与UPF(User Plane Function,用户面功能)的会话环境,UPF主要负责协PDU的路由、转发,以及对分组数据的策略执行。
在一实施例中,用户会话数据可以包括但不限于SEID(Session Endpoint Identifier,会话端点标识)、N4IP地址、私网地址、私网端口、协议类型、目的IP地址、目的端口、包数、时长、已知的应用类型、URL、IMSI等。
步骤302:将用户签约数据和用户会话数据合并得到合并流数据;
在一实施例中,将用户签约数据和用户会话数据合并得到合并流数据,可以便于NWDAF进行数据获取,将用户签约数据和用户会话数据合并成合并流数据,可以使得NWDAF获取到的数据流更加清晰,便于NWDAF进行后续的数据分析。
步骤303:将合并流数据发送至NWDAF,以供NWDAF从合并流数据中筛选出对应同一个应用的目标流数据后根据目标流数据确定目标应用,以及获取目标应用的流量变化趋势并根据流量变化趋势为增长趋势,将目标应用识别为热门应用。
通过向NWDAF发送合并流数据,以供NWDAF从合并流数据中筛选出对应同一个应用的目标流数据进而确定目标应用,再获取该目标应用的流量变化趋势,根据流量变化趋势对目标应用进行热门应用识别,使得热门应用的识别无须依赖特征库,避免由人工维护带来的滞后问题,从而提高热门应用识别的时效性,及时发现新的热门应用。
参照图4,在一实施例中,上述步骤302中,将用户签约数据和用户会话数据合并得到合并流数据,具体可以包括以下步骤401至步骤402:
步骤401:将从SMF和UPF获取的用户会话数据通过会话端点标识SEID或者N4IP地址合并;
在一实施例中,由于SMF和UPF提供的用户会话数据均包括有SEID和N4IP,因此可以通过SEID和N4IP进行数据流的合并,示例性地,可以使用合并键进行数据流的合并。
步骤402:将合并后的用户会话数据通过国际移动用户识别码IMSI与用户签约数据合并得到合并数据流。
在一实施例中,由于SMF和UPF合并后的用户会话数据和UDM的用户签约数据均包括有IMSI,因此可以通过IMSI将合并后的用户会话数据和用户签约数据进行进一步的合并,得到合并流数据,最终得到的合并流数据可以包括IMSI、年龄、性别、地域、职业、SEID、N4IP地址、私网地址、私网端口、协议类型、目的IP地址、目的端口、包数、时长、已知的应用类型、URL等。
参照图5,本申请实施例还提供了一种热门应用识别方法,应用于NWDAF,包括但不限于以下步骤501至步骤503:
步骤501:获取来自IPDR的合并流数据,其中,合并流数据包括用户签约数据和用户会话数据;
其中,在步骤501中,用户签约数据和用户会话数据在前面的实施例中已经做出相应的解释,在此不再赘述,用户签约数据和用户会话数据如何合并成合并流数据也在前面的实施例中做出相应的解释,在此也不再赘述。
步骤502:从合并流数据中筛选出对应同一个应用的目标流数据,根据目标流数据确定目标应用;
在一实施例中,从合并流数据中筛选出对应同一个应用的目标流数据,可以根据目的IP地址进行筛选,例如目的IP地址相同的数据流可以认为是对应同一个应用的目标流数据,那么这个目的IP地址所对应的应用就可以作为目标应用。
合并流数据中可以包括多个应用的流数据,每个应用的流数据均包括有对应的用户签约数据和用户会话数据,其中,不同的应用的流数据所对应的目的IP地址一般不同,举例来说,合并流数据的具体结构可以为:
第一流数据:第一用户签约数据和第一用户会话数据,其中,第一用户会话数据中目的IP地址为IP1;
第二流数据:第二用户签约数据和第二用户会话数据,其中,第二用户会话数据中目的IP地址为IP1;
第三流数据:第三用户签约数据和第三用户会话数据,其中,第三用户会话数据中目的IP地址为IP2;
第四流数据:第四用户签约数据和第四用户会话数据,其中,第四用户会话数据中目的IP地址为IP2;
第五流数据:第五用户签约数据和第五用户会话数据,其中,第五用户会话数据中目的IP地址为IP3;
基于此,从合并流数据中筛选出对应同一个应用的目标流数据,可以根据目的IP地址进行 筛选,即上述第一流数据和第二流数据为对应第一应用的目标流数据,因此该第一应用可以作为目标应用;上述第三流数据和第四流数据为对应第二应用的目标流数据,因此该第二应用可以作为另一个目标应用。
当然,采用目的IP地址进行筛选只是其中一种方式,实际上还可以利用其它维度进行筛选,只要该维度能够唯一标识某一个应用即可。
可以理解的是,上述仅为示意性说明,用于解释如何从合并流数据中筛选出对应同一个应用的目标流数据,实际的合并流数据的总数据量可能很多,对应同一个应用的目标流数据的数量也可能很多。
步骤503:获取目标应用的流量变化趋势,根据流量变化趋势为增长趋势,将目标应用识别为热门应用。
在一实施例中,根据流量变化趋势对目标应用进行热门应用识别,使得热门应用的识别无须依赖特征库,避免由人工维护带来的滞后问题,从而提高热门应用识别的时效性,及时发现新的热门应用。
在一实施例中,当目标应用的流量变化趋势为增长趋势,即该目标应用的访问量变得越来越多,代表该目标应用为新的热门应用。
在一实施例中,目标应用的流量变化趋势可以基于访问该目标应用的总流量、总链接数或者该目标应用的总用户数得到。
在一实施例中,获取流量变化趋势,可以使用趋势预测算法实现,本申请实施例并不限制具体的算法实现,例如使用自回归差分移动平均模型ARIMA等。
参照图6,在一实施例中,上述步骤502中,从合并流数据中筛选出对应同一个应用的目标流数据,具体可以包括以下步骤601至步骤602:
步骤601:对合并流数据进行用户群体划分;
在一实施例中,合并流数据包含有多种多样的流数据,涵盖了多个用户群体,示例性地,对合并流数据进行用户群体划分,可以以单个维度进行划分,例如,以年龄作为维度,可以把用户群体划分成年轻人、中年人、老年人等;又如,以职业作为维度,可以把用户群体划分成医生、学生、白领等,当然,上述仅示例性地展示本申请实施例的用户群体划分方式,实际上还可以根据其他维度进行用户群体划分。
另外,除了上述的用户群体划分方式以外,在一实施例中,对合并流数据进行用户群体划分,还可以利用聚类算法,直接计算用户群体的特征行为距离,其中,用户群体的特征可以分为有序型指标和无序型指标。
示例性地,对于有序型指标,例如年龄等可以进行排序的数据,可以使用闵可夫斯基距离 (Minkowski distance)进行特征行为距离的计算,具体地:
给定样本x i=(x i1;x i2;...;x in)与x j=(x j1;x j2;...;x jn),则特征行为距离为:
Figure PCTCN2021095422-appb-000001
其中,p代表次方数,可以取1、2、3等,代表一阶距离、二阶距离、三阶距离等。
而对于无序型指标,例如职业等无法进行排序的数据,可以使用混合距离计算公式,例如VDM(Value Difference Metric),将闵可夫斯基距离和VDM结合即可处理混合属性,假定有n c个有序属性,n-n c个无序属性,不失一般性,令有序属性在无序属性之前,则
Figure PCTCN2021095422-appb-000002
令m u,a表示在属性u上取值为a的样本数,m u,a,i表示在第i个样本簇中在属性u上取值为a的样本数,k为样本簇数,p为次方数,则属性u上两个离散值a与b之间的VDM距离(即需要计算的特征行为距离)为:
Figure PCTCN2021095422-appb-000003
其中,特征行为距离最接近的数据属于相同的用户群体,不断的通过上述方式进行迭代计算,最终可以完成合并流数据的用户群体划分。
步骤602:筛选出被相同用户群体访问同一个应用所产生的流数据,得到目标流数据。
举例来说,进行用户群体划分后,合并流数据的具体结构可以为:
第一流数据:第一用户签约数据和第一用户会话数据,其中,第一用户会话数据中目的IP地址为IP1,对应第一用户群体;
第二流数据:第二用户签约数据和第二用户会话数据,其中,第二用户会话数据中目的IP地址为IP1,对应第一用户群体;
第三流数据:第三用户签约数据和第三用户会话数据,其中,第三用户会话数据中目的IP地址为IP2,对应第一用户群体;
第四流数据:第四用户签约数据和第四用户会话数据,其中,第四用户会话数据中目的IP地址为IP2,对应第二用户群体;
第五流数据:第五用户签约数据和第五用户会话数据,其中,第五用户会话数据中目的IP地址为IP3,对应第二用户群体;
基于此,上述第一流数据和第二流数据为被相同用户群体访问同一个应用所产生的流数据。
同样地,上述仅为示意性说明,实际的合并流数据的总数据量可能很多,对应同一个应用的目标流数据的数量也可能很多。
通过对合并流数据进行用户群体划分,可以提高在筛选出对应同一个应用的目标流数据时的准确性,从而提高后续的热门应用识别的准确性和可靠性。
参照图7,在一实施例中,对合并流数据进行用户群体划分后,还可以进行进一步的补充筛选,以进一步提高准确性,基于此,上述步骤602中,筛选出被相同用户群体访问同一个应用所产生的流数据,具体可以包括以下步骤701至步骤702:
步骤701:根据相同用户群体对同一个应用的访问行为数据识别出应用的应用类型;
在一实施例中,相同用户群体对同一个应用访问行为数据如果是类似的,则可以对这个应用的应用类型进行估计。其中,访问行为数据可以包括但不限于访问时间、访问时长、访问流量分布、访问流量大小、源端口等。示例性地,当访问时间都是晚上说明用户群体可能是学生,则可能是游戏类应用或者视频类应用;当访问时长很长说明应用可能是游戏类应用;当访问流量分布很均匀,但是访问流量很大,可能是视频类应用;当访问流量分布很均匀,但是访问流量很小,可能是聊天类应用或者游戏类应用;另外,若访问的源端口是固定的,也可以通过源端口对应用类型进行估计。综上,如果相同用户群体对同一个应用的访问行为数据存在潜在的固有规律,可以认为该应用为热门应用识别的目标应用。
在一实施例中,访问行为数据包含了多个维度的数据,当以单个维度方面的分析可以使用简单的方差或者单个维度的统计分布来判断单个维度的数据特征是否分布固定,对于多个维度之间的关系,可以通过协方差矩阵或者散度矩阵或者PCA等特征识别的方式判断多个维度是否有相关性,如果多个特征维度是相关的,则可以通过多个特征维度识别出应用的应用类型。举例来说,游戏类应用类的访问时长较长,访问流量较小,访问链接数较多;短视频类的访问时间分布呈现间隔性,访问流量大小也呈现间隔的峰值,访问链接数不多。
步骤702:根据应用类型筛选出应用对应的流数据。
根据应用类型筛选出该应用对应的流数据,作为目标流数据,可以根据应用类型判断对应的应用是否有必要作为目标应用,从而进一步地提高后续的热门应用识别的准确性和可靠性。
在一实施例中,在从合并流数据中筛选出对应同一个应用的目标流数据之前,可以先对合并流数据进行量化,示例性地,可以利用数字编号填充职业字段,利用经纬度填充地域字段,便于后续对合并流数据进行分析,需要补充说明的是,上述量化方式仅为示例性的,实际应用中可以根据情况选择合适的量化方式。
在一实施例中,在从合并流数据中筛选出对应同一个应用的目标流数据之前,还可以对合 并流数据进行特征分析,得到合并流数据中各个维度的权重,剔除权重小于第一预设阈值的维度所对应的数据。示例性地,可以使用PCA(Principal Component Analysis,主成分分析)等算法识别出主要的维度对合并流数据进行降维,把不重要的维度所对应的数据剔除。例如在合并流数据中,年龄、职业的权重最大,地域的权重较小,则可以将地域维度的数据忽略。示例性地,上述第一预设阈值可以为2%、5%、10%等,可以根据实际情况自由设置,例如,当第一预设阈值为5%时,在合并流数据中地域的权重仅为3%,则将地域这个维度的数据剔除。通过对合并流数据进行降维,可以降低一些不重要的维度对合并流数据的分析的准确性的影响,并且能够提高对合并流数据的分析的效率,提高分析的准确性。
可以理解的是,上述对合并流数据进行量化和对对合并流数据进行降维的步骤可以择一执行,也可以全部执行,本申请实施例中并不作限定。
在一实施例中,筛选出被相同用户群体访问同一个应用所产生的流数据之前,可以先从预设的特征库中获取已知数据,其中,已知数据根据统一资源定位符URL、目的端口、协议类型和应用类型中的至少一种得到,得到已知数据后,从合并流数据中剔除与已知数据对应的流数据。下面列举几个常见的已知数据判断方式:
URL字段是具体知名网址的记录,这些记录是WEB服务,可以通过URL判断出对应的流数据是否是访问已知的应用产生的;
目的端口是80、443等知名HTTP WEB服务端口的记录,可以通过目的端口判断出对应的流数据是否是访问已知的应用产生的;
具体协议是DNS、SSH、FTP等知名协议的记录,可以通过具体协议判断出对应的流数据是否是访问已知的应用产生的;
应用类型已经被识别的记录,在UPF的会话数据中,已知的应用类型会有相应的标识字段,可以根据该标识字段判断出对应的流数据是否是访问已知的应用产生的;
由于已知的应用类型并没有热门应用识别的必要,因而通过对从合并流数据中剔除与已知数据对应的流数据,能够降低后续筛选出被相同用户群体访问同一个应用所产生的流数据时的样本数量,从而提高分析的效率和准确性。
参照图8,在一实施例中,上述步骤503中,获取目标应用的流量变化趋势,根据流量变化趋势为增长趋势,将所述目标应用识别为热门应用,具体包括以下步骤801至步骤802:
步骤801:获取目标应用在第一预设时长的流量,根据目标应用在第一预设时长的流量预测目标应用的流量变化趋势;
在一实施例中,获取目标应用在第一预设时长的流量,则可以根据一定的频率对目标应用的流量进行监控,例如第一预设时长可以是3天,则可以得到该目标应用每天的流量。然后再 通过趋势预测算法得到该目标应用在第二预设时长内的流量变化趋势,例如第二预设时长可以是7天。具体的预测算法可以采用自回归差分移动平均模型ARIMA等。通过对目标应用的流量变化趋势进行预测,能够缩短对热门应用识别所需要的时长,提高热门应用识别的效率。可以理解的是,第一预设时长和第二预设时长可以根据实际情况自由设置,例如第一预设时长也可以以小时为单位,第一预设时长可以是3小时,第二预设时长可以是24小时。
步骤802:当流量变化趋势在第二预设时长内为增长趋势且增长率超过第二预设阈值,将目标应用识别为热门应用。
在一实施例中,通过设置第二预设阈值来判断目标应用的流量变化趋势,可以提高准确性。示例性地,第二预设阈值可以是30%、40%、50%等。可以理解的是,第二预设阈值也可以根据实际情况自由设置。
在一实施例中,当目标应用被识别为热门应用,将目标流数据发送至用户面功能UPF,UPF接收到NWDAF热门应用的目标流数据后,可以根据目的IP地址、目的端口、协议等自动抓取流数据,并自动导出到推送到OAM(Operation Administration and Maintenance,操作维护管理),OAM可以根据UPF发送的流数据进行详细的应用分析,导入应用特征库,并下发到UPF,后续IPRD再次收到该应用的流数据时即为已知的应用,无须再进行热门应用识别,从而形成闭环。
下面以一实际例子对本申请实施例进行说明,参照图9,具体包括以下步骤901至步骤914:
步骤901:NWDAF从IPDR订阅未知应用的流数据;
步骤902:UPF从NWDAF订阅热门未知应用的流数据;
步骤903:IPDR从UDM订阅用户签约数据;
步骤904:用户设备UE上线,SMF创建PDU会话;
步骤905:IPDR从SMF采集控制面创建数据;
步骤906:UE使用网络服务,服务会话结束后,流数据释放;
步骤907:IPDR从UPF采集到此段流数据的用户会话数据;
步骤908:IPDR根据SEID、N4IP、IMSI等字段合并SMF、UPF、UDM的数据,得出对应的合并流数据;
步骤909:IPDR将合并流数据中未知应用的流数据推送到NWDAF;
步骤910:NWDAF持续接收IPDR的数据,分析出对应同一未知应用的合并流数据,得出该未知应用的流量变化趋势,当流量变化趋势超过第二预设阈值时,将该未知应用识别为热门未知应用;
步骤911:NWDAF把对应的合并流数据发送给UPF;
步骤912:UPF接收到该热门未知应用对应的合并流数据,开始自动抓取该热门未知应用后 续产生的流数据;
步骤913:UPF把该热门未知应用后续产生的流数据推送至OAM;
步骤914:OAM通过该热门未知应用的流数据分析该未知应用,导入应用特征库,并下发到UPF。
其中,在步骤901中,NWDAF也可以从IPDR订阅所有应用的流数据,再通过前面实施例中的已知数据剔除方式对数据进行处理,得到未知应用的流数据。
IPDR通过向NWDAF发送未知应用的合并流数据,NWDAF再从合并流数据中筛选出对应同一个未知应用的目标流数据,再获取该未知应用的流量变化趋势,根据流量变化趋势对该未知应用进行热门应用识别,使得热门应用的识别无须依赖特征库,避免由人工维护带来的滞后问题,从而提高热门应用识别的时效性,及时发现热门未知应用;然后再将该热门未知应用对应的合并流数据发送至UPF,UPF开始自动抓取该热门未知应用的流数据并推送至OAM,OAM分析该热门未知应用,导入应用特征库,并下发到UPF,使得该热门未知应用成为已知应用,形成闭环。
还应了解,本申请实施例提供的各种实施方式可以任意进行组合,以实现不同的技术效果。
另外,本申请实施例还提供了一种网络系统,包括NWDAF和IPDR,其中IPDR被设置为获取用户签约数据和用户会话数据并将用户签约数据和用户会话数据合并得到合并流数据;NWDAF连接IPDR,以从IPDR获取合并流数据,并从合并流数据中筛选出对应同一个应用的目标流数据后根据目标流数据确定目标应用,以及获取目标应用的流量变化趋势并根据流量变化趋势对目标应用进行热门应用识别。上述网络系统与上述热门应用识别方法基于相同的发明构思,因此可以及时发现新的热门应用。
另外,本申请实施例还提供了一种网络设备,包括上述网络系统,上述网络设备与上述热门应用识别方法基于相同的发明构思,因此可以及时发现新的热门应用。其中,该网络设备可以仅设置NWDAF和IPDR的其中之一,或者同时设置NWDAF和IPDR。
图10示出了本申请实施例提供的网络设备1000。网络设备1000包括:存储器1001、处理器1002及存储在存储器1001上并可在处理器1002上运行的计算机程序,计算机程序运行时被设置为执行上述的热门应用识别方法。
处理器1002和存储器1001可以通过总线或者其他方式连接。
存储器1001作为一种非暂态计算机可读存储介质,可被设置为存储非暂态软件程序以及非暂态性计算机可执行程序,如本申请实施例描述的热门应用识别方法。处理器1002通过运行存储在存储器1001中的非暂态软件程序以及指令,从而实现上述的热门应用识别方法。
存储器1001可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少 一个功能所需要的应用程序;存储数据区可存储执行上述的热门应用识别方法。此外,存储器1001可以包括高速随机存取存储器1001,还可以包括非暂态存储器1001,例如至少一个磁盘存储器1001件、闪存器件或其他非暂态固态存储器1001件。在一些实施方式中,存储器1001可能包括相对于处理器1002远程设置的存储器1001,这些远程存储器1001可以通过网络连接至该网络设备1000。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述的热门应用识别方法所需的非暂态软件程序以及指令存储在存储器1001中,当被一个或者多个处理器1002执行时,执行上述的热门应用识别方法,例如,执行图3中的方法步骤301至303、图4中方法步骤401至402、图5中方法步骤501至503、图6中方法步骤601至602、图7中方法步骤701至702、图8中方法步骤801至802、图9中方法步骤901至914。
本申请实施例还提供了计算机可读存储介质,存储有计算机可执行指令,计算机可执行指令被设置为执行上述的热门应用识别方法。
在一实施例中,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个控制处理器1002执行,例如,被上述网络设备1000中的一个处理器1002执行,可使得上述一个或多个处理器1002执行上述的热门应用识别方法,例如,执行图3中的方法步骤301至303、图4中方法步骤401至402、图5中方法步骤501至503、图6中方法步骤601至602、图7中方法步骤701至702、图8中方法步骤801至802、图9中方法步骤901至914。
本申请实施例包括:获取来自网际互连协议信息记录IPDR的合并流数据,其中,所述合并流数据包括用户签约数据和用户会话数据;从所述合并流数据中筛选出对应同一个应用的目标流数据,根据所述目标流数据确定目标应用;获取所述目标应用的流量变化趋势,根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用,通过获取由IPDR采集的合并流数据,并从合并流数据中筛选出对应同一个应用的目标流数据进而确定目标应用,再获取该目标应用的流量变化趋势,根据所述流量变化趋势对目标应用进行热门应用识别,使得热门应用的识别无须依赖特征库,避免由人工维护带来的滞后问题,从而提高热门应用识别的时效性,及时发现新的热门应用。
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集 成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在被设置为存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器1001技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以被设置为存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包括计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的一些实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请范围的共享条件下还可作出种种等同的变形或替换,这些等同的变形或替换均包括在本申请权利要求所限定的范围内。

Claims (13)

  1. 一种热门应用识别方法,包括:
    获取来自网际协议信息记录IPDR的合并流数据,其中,所述合并流数据包括用户签约数据和用户会话数据;
    从所述合并流数据中筛选出对应同一个应用的目标流数据,根据所述目标流数据确定目标应用;
    获取所述目标应用的流量变化趋势,根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
  2. 根据权利要求1所述的热门应用识别方法,其中,所述的从所述合并流数据中筛选出对应同一个应用的目标流数据,包括:
    对所述合并流数据进行用户群体划分;
    筛选出被相同用户群体访问同一个应用所产生的流数据,得到目标流数据。
  3. 根据权利要求1所述的热门应用识别方法,其中,所述的从所述合并流数据中筛选出对应同一个应用的目标流数据之前,还包括以下至少之一:
    对所述合并流数据进行量化;
    对所述合并流数据进行特征分析,得到所述合并流数据中各个维度的权重,剔除所述权重小于第一预设阈值的维度所对应的数据。
  4. 根据权利要求2所述的热门应用识别方法,其中,所述的筛选出被相同用户群体访问同一个应用所产生的流数据之前,还包括:
    从预设的特征库中获取已知数据,其中,所述已知数据根据统一资源定位符URL、目的端口、协议类型和应用类型中的至少一种得到;
    从所述合并流数据中剔除与所述已知数据对应的流数据。
  5. 根据权利要求2所述的热门应用识别方法,其中,所述的筛选出被相同用户群体访问同一个应用所产生的流数据,包括:
    根据相同用户群体对同一个应用的访问行为数据识别出所述应用的应用类型;
    根据所述应用类型筛选出所述应用对应的流数据。
  6. 根据权利要求1所述的热门应用识别方法,其中,所述的获取所述目标应用的流量变化趋势,根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用包括:
    获取所述目标应用在第一预设时长的流量,根据所述目标应用在第一预设时长的流量预测所述目标应用的流量变化趋势;
    当所述流量变化趋势在第二预设时长内为增长趋势且增长率超过第二预设阈值,将所述目 标应用识别为热门应用。
  7. 根据权利要求1所述的热门应用识别方法,还包括:
    当所述目标应用被识别为热门应用,将所述目标流数据发送至用户面功能UPF。
  8. 一种热门应用识别方法,包括:
    获取用户签约数据和用户会话数据;
    将所述用户签约数据和所述用户会话数据合并得到合并流数据;
    将所述合并流数据发送至NWDAF,以供所述NWDAF从所述合并流数据中筛选出对应同一个应用的目标流数据后根据所述目标流数据确定目标应用,以及获取所述目标应用的流量变化趋势并根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
  9. 根据权利要求8所述的热门应用识别方法,其中,所述的IPDR获取用户签约数据和用户会话数据,包括:
    从统一数据管理UDM获取用户签约数据;
    从会话管理功能SMF和UPF获取用户会话数据。
  10. 根据权利要求9所述的热门应用识别方法,其中,所述的将所述用户签约数据和所述用户会话数据合并得到合并流数据,包括:
    将从所述SMF和所述UPF获取的用户会话数据通过会话端点标识SEID或者N4网际协议N4IP地址合并;
    将合并后的所述用户会话数据通过国际移动用户识别码IMSI与所述用户签约数据合并得到合并数据流。
  11. 一种网络系统,包括NWDAF和IPDR,其中:
    所述IPDR被设置为获取用户签约数据和用户会话数据并将所述用户签约数据和所述用户会话数据合并得到合并流数据;
    所述NWDAF连接所述IPDR,以从所述IPDR获取所述合并流数据,并从所述合并流数据中筛选出对应同一个应用的目标流数据后根据所述目标流数据确定目标应用,以及获取所述目标应用的流量变化趋势并根据所述流量变化趋势为增长趋势,将所述目标应用识别为热门应用。
  12. 一种网络设备:
    包括如权利要求11所述的网络系统;
    或者,
    包括至少一个处理器和被设置为与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有能够被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10中任意一项所述的热门应用识别方法。
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,其中,所述计算机可执行指令被设置为使计算机执行如权利要求1至10中任意一项所述的热门应用识别方法。
PCT/CN2021/095422 2020-06-28 2021-05-24 热门应用识别方法、网络系统、网络设备及存储介质 WO2022001480A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010599481.8 2020-06-28
CN202010599481.8A CN113852565A (zh) 2020-06-28 2020-06-28 热门应用识别方法、网络系统、网络设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022001480A1 true WO2022001480A1 (zh) 2022-01-06

Family

ID=78972549

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095422 WO2022001480A1 (zh) 2020-06-28 2021-05-24 热门应用识别方法、网络系统、网络设备及存储介质

Country Status (2)

Country Link
CN (1) CN113852565A (zh)
WO (1) WO2022001480A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461369A (zh) * 2022-04-12 2022-05-10 山东省计算中心(国家超级计算济南中心) 一种面向复杂应用场景的自适应数据调度系统及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607407A (zh) * 2013-12-02 2014-02-26 中国联合网络通信集团有限公司 一种移动互联网热点业务的发现方法及系统
CA3033921A1 (en) * 2016-08-15 2018-02-22 Incognito Software Systems Inc. System and method for bandwidth activity reporting
CN109711865A (zh) * 2018-12-07 2019-05-03 恒安嘉新(北京)科技股份公司 一种基于用户行为挖掘的移动通信网流量精细化预测的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607407A (zh) * 2013-12-02 2014-02-26 中国联合网络通信集团有限公司 一种移动互联网热点业务的发现方法及系统
CA3033921A1 (en) * 2016-08-15 2018-02-22 Incognito Software Systems Inc. System and method for bandwidth activity reporting
CN109711865A (zh) * 2018-12-07 2019-05-03 恒安嘉新(北京)科技股份公司 一种基于用户行为挖掘的移动通信网流量精细化预测的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461369A (zh) * 2022-04-12 2022-05-10 山东省计算中心(国家超级计算济南中心) 一种面向复杂应用场景的自适应数据调度系统及方法
CN114461369B (zh) * 2022-04-12 2022-08-19 山东省计算中心(国家超级计算济南中心) 一种面向复杂应用场景的自适应数据调度系统及方法

Also Published As

Publication number Publication date
CN113852565A (zh) 2021-12-28

Similar Documents

Publication Publication Date Title
US10484881B2 (en) Optimization of cellular network architecture based on device type-specific traffic dynamics
US10904739B2 (en) Network data collection method from network function device for network data analytic function
CN107634848B (zh) 一种采集分析网络设备信息的系统和方法
DE112012001557B4 (de) Voraussagende Platzierung von Inhalt durch Netzwerkanalyse
EP3522466A1 (en) Dynamic scheduling and allocation method and system for network traffic
US11870649B2 (en) Multi-access edge computing based visibility network
US8179799B2 (en) Method for partitioning network flows based on their time information
CN106972985B (zh) 加速dpi设备数据处理与转发的方法和dpi设备
WO2012055023A1 (en) System for monitoring a video network and methods for use therewith
WO2022116665A1 (zh) Tcp流的调整方法和系统
CN108900374A (zh) 一种应用于dpi设备的数据处理方法和装置
EP4122162A1 (en) Resource efficient network performance analytics
WO2022001480A1 (zh) 热门应用识别方法、网络系统、网络设备及存储介质
US9749840B1 (en) Generating and analyzing call detail records for various uses of mobile network resources
CN106326280B (zh) 数据处理方法、装置及系统
CN115208955B (zh) 一种资源请求处理的方法、装置、计算机设备及介质
Fernandes et al. A stratified traffic sampling methodology for seeing the big picture
CN110972199B (zh) 一种流量拥塞监测方法及装置
CN108471387B (zh) 一种日志流量分散控制方法及系统
CN113037551B (zh) 一种基于流量切片的涉敏业务快速识别定位方法
CN112001454B (zh) 一种集客专线的聚类方法和装置
CN114266288A (zh) 一种网元检测方法及相关装置
CN114466069A (zh) 数据采集系统
CN114095383A (zh) 网络流量采样方法、系统和电子设备
Pekar et al. Towards threshold‐agnostic heavy‐hitter classification

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/05/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21832733

Country of ref document: EP

Kind code of ref document: A1