CN110378739B - Data traffic matching method and device - Google Patents

Data traffic matching method and device Download PDF

Info

Publication number
CN110378739B
CN110378739B CN201910668490.5A CN201910668490A CN110378739B CN 110378739 B CN110378739 B CN 110378739B CN 201910668490 A CN201910668490 A CN 201910668490A CN 110378739 B CN110378739 B CN 110378739B
Authority
CN
China
Prior art keywords
matching result
user
training
data traffic
cloud server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910668490.5A
Other languages
Chinese (zh)
Other versions
CN110378739A (en
Inventor
崔羽飞
张第
刘颖慧
张溶芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201910668490.5A priority Critical patent/CN110378739B/en
Publication of CN110378739A publication Critical patent/CN110378739A/en
Application granted granted Critical
Publication of CN110378739B publication Critical patent/CN110378739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/60Business processes related to postal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a data flow matching method and device. The method comprises the following steps: inputting a test sample into a first training model library to obtain a first matching result, wherein the test sample comprises user data; sending the first matching result to a core cloud server; receiving a second matching result returned by the core cloud server, wherein the second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result; and fusing the first matching result and the second matching result to obtain a final matching result, wherein the final matching result comprises the data flow type pre-ordered by the user and the data flow value pre-ordered by the user. By fusing the first matching result and the second matching result, a final matching result which is higher in accuracy and can truly reflect the individual requirements of the user is obtained, so that an operator can recommend a data flow package which is more suitable for the user to the user according to the final matching result, and the user experience is improved.

Description

Data traffic matching method and device
Technical Field
The invention relates to the field of computers, in particular to a data flow matching method and device.
Background
With the rapid development of mobile internet, people widely use social software such as WeChat and QQ for communication, and meanwhile, users access the internet through mobile terminals to obtain required information, so that the demands of the users on data services are remarkably increased.
At present, the traffic service used by the mobile terminal user is in the form of a traffic package, and the user applies for and orders from the operator, that is, the operator provides a plurality of data traffic packages, and the user selects and orders the data traffic packages provided by the operator according to the data service requirement of the user. However, in real life, different users have different data traffic using requirements, and in the face of a plurality of data traffic packages, users may have a selection question when selecting. If the data flow in the package selected by the user is large, the problem of flow waste can occur; if the data traffic in the package selected by the user is small, the user may not have available data traffic by the end of the month, resulting in an interruption of data communication. The user cannot accurately summarize the data traffic service condition of the user, and the operator cannot recommend a more reasonable data traffic package to the user, so that the user experience is poor.
Disclosure of Invention
Therefore, the invention provides a data traffic matching method and device, and aims to solve the problem of poor user experience caused by the fact that a proper data traffic service cannot be recommended according to the personalized requirements of a user in the prior art. The invention aims to solve the problems that: how to recommend personalized data traffic services to users more accurately.
In order to achieve the above object, a first aspect of the present invention provides a service traffic matching method, where the method includes: inputting a test sample into a first training model library to obtain a first matching result, wherein the test sample comprises user data; sending the first matching result to a core cloud server; receiving a second matching result returned by the core cloud server, wherein the second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result; and fusing the first matching result and the second matching result to obtain a final matching result, wherein the final matching result comprises the data flow type pre-ordered by the user and the data flow value pre-ordered by the user.
Wherein, the user data comprises: at least two of the data traffic package ordered by the user, the type of data traffic actually used by the user, the value of data traffic actually used by the user, and the user's telephone charge.
The method comprises the following steps of inputting a test sample into a first training model library to obtain a first matching result, wherein the steps comprise: for each classification model of the first training model library, the following operations are performed: performing cross validation on the classification model by using the training samples to obtain a prediction training sample of the classification model; testing the classification model by using the test sample to obtain a prediction test sample corresponding to the classification model; and determining a first matching result according to the first training model base, the prediction training sample and the prediction test sample corresponding to each classification model in the first training model base.
The step of determining a first matching result according to the first training model library, the prediction training sample and the prediction test sample corresponding to each classification model of the first training model library comprises the following steps: stacking at least two prediction training samples to obtain a new training sample; determining a new test sample according to each predicted test sample; and determining a first matching result according to the new training sample, the first training model base and the new testing sample.
Wherein, according to the new training sample, the first training model base and the new testing sample, the step of determining the first matching result comprises: training any model in the first training model library by using a new training sample by using an exhaustive search method to obtain an optimal training model; and inputting the new test sample into the optimal training model to obtain a first matching result.
Wherein, the first training model library comprises: at least two classification models of a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rough regression classification model.
In order to achieve the above object, a second aspect of the present invention provides a data traffic matching method, including: receiving a first matching result sent by an edge cloud server, putting the first matching result into a training sample, and updating the training sample, wherein the test sample comprises user data; training the updated training sample to obtain a second training model library; inputting a test sample into a second training model library to obtain a second matching result, wherein the test sample comprises user data; and sending the second matching result to the edge cloud server so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result.
Wherein, the user data comprises: at least two of the data traffic package ordered by the user, the type of data traffic actually used by the user, the value of data traffic actually used by the user, and the user's telephone charge.
In order to achieve the above object, a third aspect of the present invention provides a data traffic matching apparatus, including: the first acquisition module is used for inputting a test sample into a first training model library to obtain a first matching result, wherein the test sample comprises user data; the first sending module is used for sending the first matching result to the core cloud server; the first receiving module is used for receiving a second matching result returned by the core cloud server, the second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result; and the fusion module is used for fusing the first matching result and the second matching result to obtain a final matching result, wherein the final matching result comprises a data flow type pre-ordered by the user and a data flow value pre-ordered by the user.
In order to achieve the above object, a fourth aspect of the present invention provides a data traffic matching apparatus, including: the second receiving module is used for receiving the first matching result sent by the edge cloud server, putting the first matching result into the training sample, and updating the training sample, wherein the testing sample comprises user data; the training module is used for training the user data in the updated training sample to obtain a second training model base; the second acquisition module is used for inputting the test sample into a second training model library to obtain a second matching result, wherein the test sample comprises user data; and the second sending module is used for sending the second matching result to the edge cloud server so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result.
The invention has the following advantages: the test sample is input into the first training model, a first matching result capable of meeting the user requirement is obtained preliminarily, and the accuracy of a second matching result returned by the core cloud server is higher due to the fact that the core cloud server can gather more data; and then, a first matching result which reflects the characteristics of the data flow used by the user in an edge cloud server and a second matching result which can reflect the characteristics of the data flow used by all the users are fused to obtain a final matching result which has higher accuracy and can truly reflect the personalized requirements of the user, so that when an operator recommends a data flow package for the user, the operator can recommend the data flow package which is more suitable for the user by combining the final matching result, better service is brought for the user, and the user experience is improved.
The user data comprises the data traffic type actually used by the user, and the first matching result is obtained by inputting the test sample comprising the data traffic type of the user into the first training model library, so that the first matching result can reflect the data traffic requirements of the user when the user performs different services, and is further fused with the second matching result to obtain the final matching result with higher accuracy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flow chart of a data traffic matching method according to a first embodiment of the present invention;
fig. 2 is a flow chart of a data traffic matching method according to a second embodiment of the present invention;
fig. 3 is a flow chart of a data traffic matching method according to a second embodiment of the present invention;
fig. 4 is a flow chart of a data traffic matching method according to a third embodiment of the present invention;
fig. 5 is a block diagram of a data traffic matching apparatus according to a fourth embodiment of the present invention;
fig. 6 is a block diagram of a data traffic matching apparatus according to a fifth embodiment of the present invention;
fig. 7 is a block diagram of a data traffic matching apparatus according to a fifth embodiment of the present invention.
In the drawings:
501: the first obtaining module 502: first sending module
503: the first receiving module 504: fusion module
601: the second receiving module 602: training module
603: the second obtaining module 604: second sending module
701: edge cloud server 702: core cloud server
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
A first embodiment of the present invention relates to a data traffic matching method. And the method is used for more accurately recommending the personalized data traffic service for the user according to the final matching result.
The following describes implementation details of the service traffic matching method in this embodiment in detail, and the following is only for facilitating understanding of the implementation details of the present solution and is not necessary for implementing the present solution.
Fig. 1 is a flowchart of a data traffic matching method in this embodiment, and the method may be used in an edge cloud server.
It should be noted that the edge cloud server refers to a network node having fewer intermediate links from the access of the end user, and has better response capability and connection speed for the end access user. The edge cloud server may be a node server of a Content Delivery Network (CDN), or may be a node server in the internet. In addition, the edge cloud server can store the user with large access amount and user data on the special cache device, and further improve the data processing speed.
The method may include the following steps.
In step 101, a test sample is input into a first training model library to obtain a first matching result.
Wherein the test sample includes user data, the user data including: at least two of the data traffic package ordered by the user, the type of data traffic actually used by the user, the value of data traffic actually used by the user, and the user's telephone charge.
It should be noted that the data traffic types actually used by the user may include local traffic, inter-provincial roaming traffic, international roaming traffic, and hong kong and australian station roaming traffic, and the data traffic types actually used by the user are different according to different requirements of the user.
In one particular implementation, the user data may also include voice usage information for the user, such as: the user is used as the information of the calling time length when the user is called or called, and the corresponding telephone charge, etc. It should be noted that the user data may be set according to practical situations, and is not limited to the above example, and other non-example information is also within the protection scope of the present invention, and is not described herein again.
Wherein, the first training model library comprises: at least two classification models of a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rough regression classification model.
It should be noted that the random forest classification model refers to a classifier that trains and predicts a sample by using a plurality of trees. The logistic regression is a two-classification problem, the two-classification problem means that the predicted y value is only two values (0 or 1), and the two-classification problem can be extended to a multi-classification problem, so that the logistic regression classification model can be used for training the user data, and the classification result can be obtained quickly.
The Decision Tree (Decision Tree) classification model is a Decision analysis method which is used for obtaining the probability that the expected value of the net present value is greater than or equal to zero by forming a Decision Tree on the basis of the known occurrence probability of various conditions, evaluating the risk of a project and judging the feasibility of the project, and is a graphical method for intuitively using probability analysis. Since such decision branches are drawn like branches of a tree, where each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a class, the decision tree is called. In machine learning, a decision tree is a predictive model that represents a mapping between object attributes and object values.
The Extreme Gradient boost data model is a data model established by adopting an Extreme Gradient boost (XGboost) algorithm, and the Boosting method is a method for improving the accuracy of a weak classification algorithm. The XGboost algorithm is one of boosting algorithms and is a tree model, so that a plurality of tree models are integrated together to form a strong classifier.
It should be noted that at least two kinds of information including a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user, and a telephone charge of the user are preprocessed, features are extracted from original user data to the maximum extent to obtain a test sample, and then the test sample is input into a first training model library for testing, so that a data traffic package type suitable for the user, namely a first matching result, can be preliminarily obtained.
In step 102, a first matching result is sent to the core cloud server.
It should be noted that the first matching result is obtained by modeling and testing according to the user data collected on the edge cloud server, and can only embody the feature of the user usage data traffic in one edge cloud server. Therefore, the first matching result needs to be sent to the core cloud server to update the training sample of the core cloud server, so as to obtain a more accurate data traffic package type of the user.
In step 103, a second matching result returned by the core cloud server is received.
The second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result.
It should be noted that, after receiving the first matching result, the core cloud server may update its own training sample, and then train the updated training sample to obtain a second training model library, where the second training model library includes: at least two classification models of a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rough regression classification model. The test samples are input into the second training model library for testing, a second matching result can be obtained, and the training samples of the core cloud server are the features of the user data of all the edge cloud servers, so that the second matching result can reflect the use requirements of all the users on the data traffic.
In step 104, the first matching result and the second matching result are fused to obtain a final matching result.
And the final matching result comprises the data flow type pre-ordered by the user and the data flow value pre-ordered by the user.
It should be noted that, a first matching result that represents characteristics of data traffic used by a user in a certain edge cloud server and a second matching result that can represent usage requirements of all users for the data traffic are fused, for example, the first matching result and the second matching result are voted or weighted to obtain an average, so as to obtain a final matching result, and a final matching result with higher accuracy can be obtained.
At present, the existing data traffic packages of each operator usually include data traffic types and data traffic values corresponding to the data traffic types, and by matching the data traffic types in the existing packages with the data traffic types pre-ordered by the user in the final matching result and matching the data traffic values corresponding to the data traffic types in the existing packages with the data traffic values pre-ordered by the user in the final matching result, a data traffic package more suitable for the user can be recommended to the user, better service is brought to the user, and user experience is improved.
In the embodiment, the test sample is input into the first training model, a first matching result capable of meeting the user requirement is obtained preliminarily, and the accuracy of a second matching result returned by the core cloud server is higher as the core cloud server can gather more data; and then, a first matching result which reflects the characteristics of the data flow used by the user in an edge cloud server and a second matching result which can reflect the characteristics of the data flow used by all the users are fused to obtain a final matching result which has higher accuracy and can truly reflect the personalized requirements of the user, so that when an operator recommends a data flow package for the user, the operator can recommend the data flow package which is more suitable for the user by combining the final matching result, better service is brought for the user, and the user experience is improved.
A second embodiment of the present invention relates to a data traffic matching method. The second embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: and obtaining a prediction training sample and a prediction test sample of each classification model in a cross validation mode, obtaining a new training sample in a stacking mode, inputting the new training sample into a first training model library for training, and obtaining a first matching result.
Fig. 2 is a flowchart of a data traffic matching method in this embodiment, and the method may be used in an edge cloud server. The method may include the following steps.
In step 201, cross-validation is performed on each classification model of the first training model library using the training samples, and a prediction training sample of each classification model is obtained.
It should be noted that, the training samples are divided into five equal parts, any four equal parts of the training samples are used as the training subsamples, the remaining one equal parts of the training samples are used as the testing subsamples, and the training subsamples are used to train any classification model in the first training model library, so as to obtain the prediction training samples of the classification model.
For example, there are 5 sub-samples in the training samples, 4 sub-samples are randomly taken as the training sub-samples, and the training sub-samples are used to train the decision tree classification model, so that the prediction training samples corresponding to the decision tree classification model can be obtained.
In step 202, each classification model of the first training model library is tested by using the test sample, and a prediction test sample corresponding to each classification model is obtained.
It should be noted that, a remaining one of the training samples obtained after the four equal parts in step 201 is used as a testing sub-sample, and the testing sub-sample is used to test any one of the classification models in the first training model library, so as to obtain a prediction testing sample corresponding to the classification model.
For example, 5 sub-samples are selected from the training samples, 4 sub-samples are randomly selected as the training sub-samples, the remaining sub-samples are used as the testing sub-samples, and the testing sub-samples are used for testing the decision tree classification model, so that the prediction testing samples corresponding to the decision tree classification model can be obtained.
In step 203, a first matching result is determined according to the first training model library, the prediction training sample and the prediction test sample corresponding to each classification model in the first training model library.
In one particular implementation, at least two predicted training samples are stacked (stacking) to obtain a new training sample; determining a new test sample according to each predicted test sample; and determining a first matching result according to the new training sample, the first training model base and the new testing sample.
Specifically, stacking at least two prediction training samples to obtain a new training sample, specifically, the following operations may be performed: dividing the training sample into five equal parts, using any four equal parts of the training samples as training sub-samples, using the remaining one equal part of the training samples as testing sub-samples, training any classification model in the first training model library by using the training sub-samples to obtain 5 pre-training sets, and longitudinally superposing the 5 pre-training sets to obtain a new training sample corresponding to the classification model.
Specifically, according to each predicted test sample, a new test sample is determined, and specifically, the following operations may be performed: taking the remaining training samples of the four equal parts obtained in step 201 as testing subsamples, testing any classification model in the first training model library by using the testing subsamples to obtain 5 different pretesting results, and performing addition and averaging calculation on the 5 pretesting results to obtain a new testing sample corresponding to the classification model.
For example, there are 5 sub-samples in the training sample, 4 sub-samples are randomly taken as training sub-samples, and the remaining one sub-sample is taken as a testing sub-sample; the training subsample is used for training the random forest classification model to obtain 5 pre-training sets, and the 5 pre-training sets are longitudinally superposed to obtain a new training sample of the random forest classification model.
And testing the random forest classification model by using the test subsample to obtain 5 pretest results, and then adding and averaging the 5 pretest results to obtain a new test sample of the random forest classification model.
According to the above operations, at least two new test samples and at least two new training samples can be obtained by respectively training at least two classification models in the first training model library. And finally, determining a first matching result according to at least two new training samples obtained by the training, the first training model library and at least two new testing samples.
In one embodiment, the step of determining the first matching result according to the new training sample, the first training model library and the new testing sample includes: training any classification model in a first training model library by using a method of exhaustive Search (Grid Search) and using a new training sample to obtain an optimal training model; and inputting the new test sample into the optimal training model to obtain a first matching result.
It should be noted that the exhaustive search means to enumerate and check one by one in a certain order, and find out candidate results that meet the requirements as final results. In this embodiment, any one of the classification models in the first training model library is trained, that is, each classification model in the first training model library is cross-verified, and cross-verification results are stacked to obtain at least two new training samples, the at least two new training samples are used to train each classification model, the final training results are compared, the training model corresponding to the optimal training result is taken as the optimal training model, and then at least two new test samples are input into the optimal training model to obtain a first matching result.
In step 204, the first matching result is sent to the core cloud server.
In step 205, a second matching result returned by the core cloud server is received.
In step 206, the first matching result and the second matching result are fused to obtain a final matching result.
And the final matching result comprises the data flow type pre-ordered by the user and the data flow value pre-ordered by the user.
It should be noted that steps 204 to 206 in this embodiment are the same as steps 102 to 103 in the first embodiment, and are not repeated herein.
In a specific implementation, fig. 3 is a flowchart of a data traffic matching method, in which after an initial test sample and a training sample are processed, a first matching result and a second matching result are obtained through two model training processes, and then the first matching result and the second matching result are fused to obtain a final matching result.
In step 301, the raw data is preprocessed to obtain an initial test sample and a training sample, and a first training model library is obtained at the same time.
It should be noted that the raw data is obtained by obtaining user data from different types of databases (e.g., a data warehouse based on a Distributed File System (HDFS)). The user data may include: the data traffic package subscribed by the user, the type of the data traffic actually used by the user (including local traffic, provincial roaming traffic, international roaming traffic, port and Australian station roaming traffic, etc.), the value of the data traffic actually used by the user, the telephone charge of the user, and the like, and specifically, the voice use information of the user (for example, the call duration when the user is a calling party or a called party, and the corresponding telephone charge, etc.). It should be noted that the user data may be set according to practical situations, and is not limited to the above example, and other non-example information is also within the protection scope of the present invention, and is not described herein again.
The method comprises the steps of rapidly processing user data by using a database-based computing engine, namely, carrying out data conversion, data exploration, attribute specification and other processing on different types of data in the user data to standardize the user data, sorting fields related to a terminal used by a user in the standardized user data to obtain an initial sample, further segmenting the initial sample (for example, 70% of data is used as a training sample and 30% of data is used as the testing sample according to a ratio of 7: 3), and storing the segmented testing sample and training sample on an HDFS (Hadoop distributed File System).
It should be noted that, if the given user data has fewer features, the existing data can be subjected to feature engineering, that is, features can be extracted from the original data to the maximum extent for use by algorithms and models. Firstly, feature selection is carried out on user data, then feature dimensions are constructed, and finally data with features of multiple dimensions are generated.
The first training model base is a set of training models obtained by optimizing at least two classification models selected from a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rogow regression classification model.
In step 302, each classification model of the first training model library is tested using the test sample, and a prediction test sample corresponding to each classification model is obtained.
In step 303, cross-validation is performed on each classification model of the first training model library using the training samples to obtain a predicted training sample for each classification model.
It should be noted that the steps 302 to 303 are the same as the steps 201 to 202 in the second embodiment, and are not described herein again.
In step 304, a new test sample is determined from each predicted test sample.
In step 305, at least two predicted training samples are stacked to obtain a new training sample.
In step 306, any model in the first training model library is trained by using a new training sample by using an exhaustive search method, so as to obtain an optimal training model.
In step 307, a new test sample is input into the optimal training model to obtain a first matching result.
It should be noted that the contents of steps 304 to 307 are the same as those of step 203 in the second embodiment, and are not described herein again.
In step 308, the first matching result is sent to the core cloud server.
In step 309, a second matching result returned by the core cloud server is received.
In step 310, the first matching result and the second matching result are fused to obtain a final matching result.
It should be noted that the steps 308 to 310 are the same as the steps 102 to 103 in the first embodiment, and are not described herein again.
In the embodiment, the predictive training samples and the predictive test samples of each classification model in the first training model library are obtained in a cross validation mode, new training samples are obtained in a stacking mode, the new training samples and the new test samples are input into the first training model library for training, and a first matching result is obtained, so that the accuracy of the first matching result is higher, and the individual requirements of a user can be truly reflected.
A third embodiment of the present invention relates to a data traffic matching method. Fig. 4 is a flowchart of a data traffic matching method in this embodiment, which may be used in a core cloud server. The method may include the following steps.
In step 401, a first matching result sent by the edge cloud server is received, and the first matching result is put into a training sample, and the training sample is updated.
It should be noted that the first matching result can only reflect a data traffic package type that a user in one edge cloud server desires to obtain, and multiple edge cloud servers need to be aggregated to obtain multiple first matching results, and the multiple first matching results are put into a training sample, so that the core cloud server can obtain more representative samples.
For example, the first edge cloud server collects user data of a first province, and the obtained first matching result 1 represents a data traffic package type expected to be obtained by a user of the province; the second edge cloud server collects user data of a second province, and the obtained first matching result 2 represents a data traffic package type expected to be obtained by the users of the province; … …, the nth edge cloud server collects the user data of the nth province, and the obtained first matching result n represents the type of data traffic package that the user of the province desires to obtain; the training sample of the core cloud server is updated by putting the first matching result 1, the first matching result 2, … … and the first matching result n into the training sample, so that the core cloud server can count the data traffic service conditions of users all over the country, and further obtain a more accurate data traffic package type of the users.
Wherein the test sample includes user data, the user data including: at least two of the data traffic package ordered by the user, the type of data traffic actually used by the user, the value of data traffic actually used by the user, and the user's telephone charge.
It should be noted that the data traffic types actually used by the user may include local traffic, inter-provincial roaming traffic, international roaming traffic, and hong kong and australian station roaming traffic, and the data traffic types actually used by the user are different according to different requirements of the user.
In one particular implementation, the user data may also include voice usage information for the user, such as: the user is used as the information of the calling time length when the user is called or called, and the corresponding telephone charge, etc. It should be noted that the user data may be set according to practical situations, and is not limited to the above example, and other non-example information is also within the protection scope of the present invention, and is not described herein again.
In step 402, the updated training samples are trained to obtain a second training model library.
Wherein the second training model library comprises: at least two classification models of a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rough regression classification model.
It should be noted that the updated training sample may include the first matching result sent by the edge cloud server, and may also include user data acquired by the core cloud server itself, and the updated training sample may be set according to the actual situation, and is not limited to the above description, and other information that is not illustrated is also within the protection scope of the present invention, and is not described herein again.
In step 403, the test sample is input into the second training model library to obtain a second matching result.
It should be noted that at least two kinds of information including a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user, and a telephone charge of the user are preprocessed, features are extracted from original user data to the maximum extent to obtain a test sample, and then the test sample is input into a second training model library for testing, so that a data traffic package type suitable for the user, namely a second matching result, can be preliminarily obtained.
In step 404, the second matching result is sent to the edge cloud server, so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result.
It should be noted that, by fusing the first matching result representing the characteristics of the user usage data traffic in a certain edge cloud server and the second matching result representing the characteristics of all the user usage data traffic, (for example, stacking and fusing the first matching result and the second matching result), a final matching result with higher accuracy can be obtained.
In the embodiment, the updated training sample is trained by adopting the training method the same as that of the second embodiment to obtain the second matching result, the core cloud server can gather more data, so that the accuracy of the second matching result returned by the core cloud server is higher, and then the second matching result is sent to the edge cloud server, so that the edge cloud server can fuse the second matching result and the first matching result to obtain the final matching result with higher accuracy. When an operator recommends a data traffic package for a user, the final matching result can be combined to recommend the data traffic package more suitable for the user, so that better service is brought to the user, and the user experience is improved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
The fourth embodiment of the present invention relates to a data traffic matching device, and the specific implementation of the device can refer to the related description of the first embodiment, and repeated details are not repeated. It should be noted that, the specific implementation of the apparatus in this embodiment may also refer to the related description of the second embodiment, but is not limited to the above two examples, and other unexplained examples are also within the protection scope of the apparatus.
As shown in fig. 5, the apparatus mainly includes: a first obtaining module 501, configured to input a test sample into a first training model library to obtain a first matching result, where the test sample includes user data; a first sending module 502, configured to send the first matching result to the core cloud server; the first receiving module 503 is configured to receive a second matching result returned by the core cloud server, where the second matching result is a matching result obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result; and the fusion module 504 is configured to fuse the first matching result and the second matching result to obtain a final matching result, where the final matching result includes a data traffic type pre-ordered by the user and a data traffic value pre-ordered by the user.
In one example, the user data in the first obtaining module 501 includes: at least two of the data traffic package ordered by the user, the type of data traffic actually used by the user, the value of data traffic actually used by the user, and the user's telephone charge.
The fifth embodiment of the present invention relates to a data traffic matching device, and for specific implementation of the device, reference may be made to the related description of the third embodiment, and repeated descriptions are omitted.
As shown in fig. 6, the apparatus mainly includes: the second receiving module 601 is configured to receive the first matching result sent by the edge cloud server, put the first matching result into a training sample, and update the training sample, where the test sample includes user data; a training module 602, configured to train user data in the updated training sample to obtain a second training model library; a second obtaining module 603, configured to input a test sample into a second training model library to obtain a second matching result, where the test sample includes user data; the second sending module 604 is configured to send the second matching result to the edge cloud server, so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result.
In one particular implementation, as shown in fig. 7, the edge cloud server 701 includes the following modules: a first obtaining module 501, a first sending module 502, a first receiving module 503 and a fusing module 504; the core cloud server 702 includes the following modules: a second receiving module 601, a training module 602, a second obtaining module 603, and a second sending module 604.
A first matching result is obtained through calculation of a first obtaining module 501 of the edge cloud server, the first matching result is sent to a second receiving module 601 of the core cloud server through a first sending module 502, so that the core cloud server can update the training sample, and then the training module 602 is used for training user data in the updated training sample to obtain a second training model base; inputting the test sample into a second training model library through a second obtaining module 603 to obtain a second matching result; the second matching result is sent to the first receiving module 503 of the edge cloud server through the second sending module 604, so that the fusion module 504 in the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result.
Through two times of training of the edge cloud server and the core cloud server, the second matching result obtained through the two times of training is fused with the first matching result, accuracy of the final matching result is improved, the final matching result can accurately and truly reflect personalized requirements of the user, the final matching result is recommended to the user, and user experience is improved.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. The data traffic matching method is characterized by being applied to an edge cloud server; the method comprises the following steps:
inputting a test sample into a first training model library to obtain a first matching result, wherein the test sample comprises user data;
sending the first matching result to a core cloud server;
receiving a second matching result returned by the core cloud server, wherein the second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result;
fusing the first matching result and the second matching result to obtain a final matching result, wherein the final matching result comprises a data flow type pre-ordered by a user and a data flow value pre-ordered by the user;
wherein the user data comprises:
at least two of a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user and a telephone charge of the user;
the first matching result is obtained by modeling and testing according to the user data collected on the edge cloud server and can only reflect the characteristics of the user data traffic in the edge cloud server; the second matching result is used for representing the use requirements of all users for the data traffic.
2. The data traffic matching method according to claim 1, wherein the step of inputting the test sample into a first training model library to obtain a first matching result comprises:
performing the following operations on each classification model of the first training model library:
performing cross validation on the classification model by using the training samples to obtain a prediction training sample of the classification model;
testing the classification model by using the test sample to obtain a prediction test sample corresponding to the classification model;
and determining the first matching result according to the first training model base, the prediction training sample corresponding to each classification model in the first training model base and the prediction test sample.
3. The data traffic matching method according to claim 2, wherein the step of determining the first matching result according to the first training model library, the predictive training sample corresponding to each classification model of the first training model library, and the predictive test sample comprises:
stacking at least two prediction training samples to obtain a new training sample;
determining a new test sample according to each predicted test sample;
and determining the first matching result according to the new training sample, the first training model base and the new testing sample.
4. The data traffic matching method of claim 3, wherein the step of determining the first matching result according to the new training sample, the first training model library and the new testing sample comprises:
training any model in the first training model library by using the new training sample by using an exhaustive search method to obtain an optimal training model;
and inputting the new test sample into the optimal training model to obtain the first matching result.
5. The data traffic matching method according to any one of claims 1 to 4, wherein each of the first training model library and the second training model library comprises:
at least two classification models of a random forest classification model, a decision tree classification model, an extreme gradient lifting data model and a Rough regression classification model.
6. A data flow matching method is characterized in that the method is applied to a core cloud server; the method comprises the following steps:
receiving a first matching result sent by an edge cloud server, putting the first matching result into a training sample, and updating the training sample, wherein the testing sample comprises user data;
training the updated training sample to obtain a second training model base;
inputting a test sample into the second training model library to obtain a second matching result, wherein the test sample comprises the user data;
sending the second matching result to the edge cloud server so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result;
wherein the user data comprises:
at least two of a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user and a telephone charge of the user;
the first matching result is obtained by modeling and testing according to the user data collected on the edge cloud server and can only reflect the characteristics of the user data traffic in the edge cloud server; the second matching result is used for representing the use requirements of all users for the data traffic.
7. A data traffic matching apparatus, comprising:
the first acquisition module is used for inputting a test sample into a first training model library to obtain a first matching result, wherein the test sample comprises user data;
the first sending module is used for sending the first matching result to a core cloud server;
the first receiving module is used for receiving a second matching result returned by the core cloud server, wherein the second matching result is obtained by inputting the test sample into a second training model library, and the second training model library is a training model library obtained by training according to the first matching result;
the fusion module is used for fusing the first matching result and the second matching result to obtain a final matching result, wherein the final matching result comprises a data flow type pre-ordered by a user and a data flow value pre-ordered by the user;
wherein the user data comprises:
at least two of a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user and a telephone charge of the user;
the first matching result is obtained by modeling and testing according to the user data collected on the edge cloud server and can only reflect the characteristics of the user data traffic in the edge cloud server; the second matching result is used for representing the use requirements of all users for the data traffic.
8. A data traffic matching apparatus, comprising:
the second receiving module is used for receiving a first matching result sent by the edge cloud server, putting the first matching result into a training sample, and updating the training sample, wherein the testing sample comprises user data;
the training module is used for training the updated user data in the training sample to obtain a second training model base;
the second acquisition module is used for inputting a test sample into the second training model library to obtain a second matching result, wherein the test sample comprises the user data;
the second sending module is used for sending the second matching result to the edge cloud server so that the edge cloud server can fuse the second matching result and the first matching result to obtain a final matching result;
wherein the user data comprises:
at least two of a data traffic packet ordered by a user, a data traffic type actually used by the user, a data traffic value actually used by the user and a telephone charge of the user;
the first matching result is obtained by modeling and testing according to the user data collected on the edge cloud server and can only reflect the characteristics of the user data traffic in the edge cloud server; the second matching result is used for representing the use requirements of all users for the data traffic.
CN201910668490.5A 2019-07-23 2019-07-23 Data traffic matching method and device Active CN110378739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910668490.5A CN110378739B (en) 2019-07-23 2019-07-23 Data traffic matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910668490.5A CN110378739B (en) 2019-07-23 2019-07-23 Data traffic matching method and device

Publications (2)

Publication Number Publication Date
CN110378739A CN110378739A (en) 2019-10-25
CN110378739B true CN110378739B (en) 2022-03-29

Family

ID=68255320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910668490.5A Active CN110378739B (en) 2019-07-23 2019-07-23 Data traffic matching method and device

Country Status (1)

Country Link
CN (1) CN110378739B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942180B (en) * 2019-11-12 2023-07-04 广州泽沐信息科技有限责任公司 Industrial design matching service side prediction method based on xgboost algorithm
CN112202888B (en) * 2020-09-30 2021-12-14 中国联合网络通信集团有限公司 Message forwarding method for edge user and SDN
CN112487295A (en) * 2020-12-04 2021-03-12 中国移动通信集团江苏有限公司 5G package pushing method and device, electronic equipment and computer storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530321A (en) * 2013-09-18 2014-01-22 上海交通大学 Sequencing system based on machine learning
CN104866626A (en) * 2015-06-15 2015-08-26 中国移动通信集团黑龙江有限公司 Method and device for recommending telecommunication service
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning
CN105930934A (en) * 2016-04-27 2016-09-07 北京物思创想科技有限公司 Prediction model demonstration method and device and prediction model adjustment method and device
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN108280462A (en) * 2017-12-11 2018-07-13 北京三快在线科技有限公司 A kind of model training method and device, electronic equipment
CN109741175A (en) * 2018-12-28 2019-05-10 上海点融信息科技有限责任公司 Based on artificial intelligence to the appraisal procedure of credit again and equipment for purchasing automobile-used family by stages
CN109886349A (en) * 2019-02-28 2019-06-14 成都新希望金融信息有限公司 A kind of user classification method based on multi-model fusion
CN109902753A (en) * 2019-03-06 2019-06-18 深圳市珍爱捷云信息技术有限公司 User's recommended models training method, device, computer equipment and storage medium
CN110009017A (en) * 2019-03-25 2019-07-12 安徽工业大学 A kind of multi-angle of view multiple labeling classification method based on the study of visual angle generic character

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150235143A1 (en) * 2003-12-30 2015-08-20 Kantrack Llc Transfer Learning For Predictive Model Development
US9949714B2 (en) * 2015-07-29 2018-04-24 Htc Corporation Method, electronic apparatus, and computer readable medium of constructing classifier for disease detection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530321A (en) * 2013-09-18 2014-01-22 上海交通大学 Sequencing system based on machine learning
CN104866626A (en) * 2015-06-15 2015-08-26 中国移动通信集团黑龙江有限公司 Method and device for recommending telecommunication service
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning
CN105930934A (en) * 2016-04-27 2016-09-07 北京物思创想科技有限公司 Prediction model demonstration method and device and prediction model adjustment method and device
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN108280462A (en) * 2017-12-11 2018-07-13 北京三快在线科技有限公司 A kind of model training method and device, electronic equipment
CN109741175A (en) * 2018-12-28 2019-05-10 上海点融信息科技有限责任公司 Based on artificial intelligence to the appraisal procedure of credit again and equipment for purchasing automobile-used family by stages
CN109886349A (en) * 2019-02-28 2019-06-14 成都新希望金融信息有限公司 A kind of user classification method based on multi-model fusion
CN109902753A (en) * 2019-03-06 2019-06-18 深圳市珍爱捷云信息技术有限公司 User's recommended models training method, device, computer equipment and storage medium
CN110009017A (en) * 2019-03-25 2019-07-12 安徽工业大学 A kind of multi-angle of view multiple labeling classification method based on the study of visual angle generic character

Also Published As

Publication number Publication date
CN110378739A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110555640B (en) Route planning method and device
CN110378739B (en) Data traffic matching method and device
CN111797320B (en) Data processing method, device, equipment and storage medium
CN110457175B (en) Service data processing method and device, electronic equipment and medium
CN110297760A (en) Building method, device, equipment and the computer readable storage medium of test data
CN110310114A (en) Object classification method, device, server and storage medium
CN110008977B (en) Clustering model construction method and device
CN111652661B (en) Mobile phone client user loss early warning processing method
CN111815169A (en) Business approval parameter configuration method and device
CN114638391A (en) Waybill risk scene identification processing method and device, computer equipment and medium
CN114462582A (en) Data processing method, device and equipment based on convolutional neural network model
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
KR101462858B1 (en) Methods for competency assessment of corporation for global business
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium
CN108711074A (en) Business sorting technique, device, server and readable storage medium storing program for executing
CN115878989A (en) Model training method, device and storage medium
CN112734352A (en) Document auditing method and device based on data dimensionality
CN115185923B (en) Method and system for managing meteorological observation metadata and intelligent terminal
CN114245392B (en) 5G network optimization method and system
Petrovic Adopting Data Mining Techniques in Telecommunications Industry: Call Center Case Study
CN112905782B (en) Volume assembling method, device, equipment and storage medium
CN115115449B (en) Optimized data recommendation method and system for financial supply chain
US20240184807A1 (en) Information system for generation of new molecule by using graph representing molecule
US20240168974A1 (en) Information system for generation of complex molecule by using graph representing molecule
CN115688998A (en) House rent income prediction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant