CN113052629B - Network user image drawing method based on CECU system intelligent algorithm model - Google Patents
Network user image drawing method based on CECU system intelligent algorithm model Download PDFInfo
- Publication number
- CN113052629B CN113052629B CN202110260517.4A CN202110260517A CN113052629B CN 113052629 B CN113052629 B CN 113052629B CN 202110260517 A CN202110260517 A CN 202110260517A CN 113052629 B CN113052629 B CN 113052629B
- Authority
- CN
- China
- Prior art keywords
- data
- user
- network
- transmission
- throughput
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000005457 optimization Methods 0.000 claims abstract description 22
- 238000007418 data mining Methods 0.000 claims abstract description 14
- 230000000007 visual effect Effects 0.000 claims abstract description 8
- 230000005540 biological transmission Effects 0.000 claims description 39
- 238000004458 analytical method Methods 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000003064 k means clustering Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 239000003638 chemical reducing agent Substances 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 238000013480 data collection Methods 0.000 claims 1
- 238000004891 communication Methods 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a network user image drawing method based on a CECU system intelligent algorithm model, which comprises the following steps: collecting user information by using local link cross-site and carrying out iterative update information; copying a plurality of consistency levels of each data item in the database by using the hierarchical transaction model, and dividing the data items into different categories; user portrayal is carried out on data items which are placed on the same site and are related to each other, and a visual model is built; and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading. According to the invention, local link cross-site user information is collected and iteratively updated, data mining classification is carried out by means of consistency index CI evaluation and Apriori support algorithm, communication delay is reduced, and dynamic tuning plates are called for data optimization, so that the problems of low individuation degree of traditional network user images and severely limited data throughput are effectively solved.
Description
Technical Field
The invention belongs to the technical field of network information, and particularly relates to a network user image drawing method based on a CECU system intelligent algorithm model.
Background
Under the promotion of rapid development of internet information technology and social information transformation, a big data age becomes a necessary trend of our modern society, a diversified network platform aims at mining and analyzing information data of all aspects of users, and user portraits are based on data mining and analysis of each network user behavior under specific use situations, so that a tag set for describing user attributes and behaviors of the user attributes is established. The meaning of the user portrayal method is not only to build a personal data information base for network users, but also to provide personalized services for network service providers.
Disclosure of Invention
The invention provides a network user portrait method based on a CECU system intelligent algorithm model, which adopts the following technical scheme:
a network user image drawing method based on a CECU system intelligent algorithm model comprises the following steps:
step 1: collecting user information by using local link cross-site and carrying out iterative update information;
step 2: utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in a database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm;
step 3: based on the collection, classified storage and data mining analysis of behavior information, user portraits are carried out on data items which are placed on the same site and are related to each other, and a visual model is built;
step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading.
Further, in step 2, the data mining algorithm is an Apriori algorithm.
Further, the step 2 specifically comprises:
step 21: according to the consistency requirement, calculating a frequent data item set according to the support degree of the frequent data item by using an Apriori algorithm, and classifying the data item and the access transaction by using a consistency index CI;
step 22: in the hierarchical transaction model, a consistency level is allocated for each transaction and its associated data item;
step 23: strongly related data items are identified and associations between frequent item data sets are established.
Further, step 21 specifically includes:
step 211: using 50% relative support, database transactions are divided into read transaction set D r And write transaction set D w ;
Step 212: each mapper accepts as input database transactions and a set of data items, generating an intermediate list of support counts for each data item;
step 213: the combiner mobilizes and merges the results from the plurality of mappers and creates a set of key-value pairs for each data item, the aggregate value of each data item being used herein as a CI;
step 214: the speed reducer compares the aggregate value serving as the CI with a given minimum support threshold value and outputs a final frequent data item list;
step 215: according to the algorithm, the final classification result is: when input D r At the time, the return value is freq r I.e. classified as a set of frequently read data items, when input D w At the time, the return value is freq w I.e. the collection of data items is updated frequently, for any data item x i ∈I,At the time, the return value is freq rw I.e. the set of data items is read frequently as well as updated frequently, the remaining data items which do not meet the minimum support threshold value, the return value being infreq, i.e. the set of infrequent data items;
step 216: classifying read-intensive transactions as T based on the above data item classifications ri Write-intensive transactions are categorized as T wi The set of read and write intensive transactions is categorized as T rwi The remaining transaction sets are classified as T rem 。
Further, the step 3 specifically comprises:
step 31: performing factor extraction on the classified data in the step 2 by adopting principal component analysis to obtain characteristic factors;
step 32: aiming at the extracted characteristic factors, a classical K-means clustering algorithm is selected to cluster all data samples, and the class number of the user group portraits is established;
step 33: and drawing a visual user mark cloud by using a word group package in python, and intuitively displaying the user portrait clustering result obtained by the analysis.
Further, the step 4 specifically comprises:
step 41: the achievable throughput U is defined as a function of the end system characteristics, network characteristics, data sets and external traffic load as attributes:
U=f(p o ,p d ,b,τ,f a ,n,cc,p,pp,l c ),
wherein, given source endpoint p o And target endpoint p d The link bandwidth is b, the round trip time is tau, and the average file size f a Number of files n, contention load transfer l c Setting u of parameters 1/4 fcc;p;ppg;
Step 42: the data transmission throughput is maximized using nearly optimal parameter values for network conditions and data sets, the optimization problem being:
the constraint is as follows: cc x p is less than or equal to N str The method comprises the steps of carrying out a first treatment on the surface of the pp is less than or equal to P; t is less than or equal to b, wherein T s And t e Respectively a transmission start time and a transmission end time, N str And P is the maximum pipeline value allowed in the network;
step 43: calling a dynamic tuning module on the history transmission log, and storing the result in a key value storage;
step 44: when a user starts a transmission process, starting a main process with two threads, namely a transmission thread and a dynamic tuning thread, collecting current network information in the transmission process and sending the current network information to an offline analysis module, returning algorithm initial parameter setting by the offline analysis module to start transferring, and periodically checking a network environment by the dynamic tuning thread;
step 45: when the dynamic optimization module detects low throughput, the dynamic optimization module sends the current network state to the offline analysis module and obtains new parameters as the current state;
step 46: the dynamic adjustment thread informs the transmission thread of parameter updating, the transmission thread uses new parameters to continue transmission, the external flow load change parameters are optimized during transmission, when the throughput is reduced, the current network condition is sent to the offline analysis module again, and the steps are circularly carried out to realize user model offline optimization and data throughput upgrading.
Further, step 45 specifically includes:
step 451: enabling a three-hierarchy cluster log;
step 452: modeling the achievable throughput of different parameters as a piecewise cubic spline function on the premise that each cluster contains data transmission logs for similar transmission tasks;
step 453: for parameter set upper bound, the parameter search space has a bounded integer field.
Further, step 451 is specifically:
step 4511: establishing a layer-1 cluster by using network characteristics and terminal system characteristics based on the network and data, establishing a layer-2 by conducting subdivision of the layer-1 based on data set information, conducting layer-2 subdivision of the layer-3 based on external load, clustering a group method, normalizing log attributes and using Euclidean distance;
step 4512: calculating the adjacent matrix of the initial cluster by adopting a weighting-free arithmetic average algorithm, and combining the two clusters at a minimum distance;
step 4513: the rows and columns of the neighbor matrix are updated with new clusters, the matrix is populated with new distance values, and the cycle repeats until all clusters merge into one cluster.
Further, step 452 specifically includes:
step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given a set of discrete points { (pp) in two-dimensional space i T), i=0, 1 … N, using a piecewise cubic polynomial g i (pp) connection continuous point-to-point (pp) i ,T i ) And (pp) i+1 ,T i+1 );
Step 4522: constructing an interpolation function g (pp) =th, and controlling the second derivative to be zero at the end point;
step 4523: all the cubic polynomial blocks are defined as g i (pp)=a i,0 +a i,1 pp+a i,2 pp+a i,3 pp,
Step 4524: let the period boundary be g (pp) i+1 )=g(pp i ) Piecewise polynomial g i Coefficient a of (pp) i,j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g i (pp)=T i I= … N, N continuity constraints resulting in g (pp) are g i-1 (pp i )=T i =g i (pp i ) I= … N, resulting in (N-2) constraints;
step 4525: additional continuity constraints are imposed on the second derivative:obtaining 2 (N-2) constraint conditions;
step 4526: the boundary conditions for the relaxed spline are:the total number of constraints thus obtained according to the above steps is n+ (N-2) +2 (N-2) +2=4 (N-1).
Further, step 453 is specifically:
step 4531: assuming beta as the upper bound of the parameter, the cubic spline surface function is expressed as f i :Wherein ψ= {1,2 … β };
step 4532: for each f k Performing a second partial derivative test, i.e. calculating f k Is a Hessian matrix of (c): j is a Laratio matrix;
step 4533: calculate the corresponding { p, pp, cc } so that H k (p, pp, cc) is a negative-set matrix, yielding f k A set of local maxima in (a);
step 4535: take f= { F 1 ,…,f p All local maxima in the set to generate a surface maxima.
The network user image method based on the CECU system intelligent algorithm model has the advantages that the user information is collected across sites by using local links and is comprehensively updated by iteration, the data mining classification is carried out by establishing a hierarchical transaction model through consistency index CI evaluation and Apriori support algorithm, communication delay is reduced, and finally, a dynamic tuning plate is called for data optimization, so that the problems of low individuation degree and severely limited data throughput of traditional network user images are effectively solved, and the user images are more accurate.
Drawings
FIG. 1 is a schematic diagram of a network user image method based on a CECU architecture intelligent algorithm model of the present invention;
FIG. 2 is a flowchart of the user portrayal model offline optimization workflow of the present invention;
FIG. 3 is an illustration of a hierarchical model of cluster logs of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the specific embodiments.
As shown in fig. 1, the network user image method based on the intelligent algorithm model of the CECU system (user model offline optimization system) of the present invention includes the following steps: step 1: user information is collected across sites using a local link and iteratively updated. Step 2: and (3) utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in the database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm. Step 3: based on the collection, classification and storage of behavior information and data mining analysis, user portraits are carried out on data items which are mutually related and placed on the same site, and a visual model is built. Step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading. The user information is comprehensively achieved by collecting user information through local links and updating the information through iterative parties, the data mining classification is carried out through establishing a hierarchical transaction model by means of consistency index CI (Consistency Index ) evaluation and Apriori support algorithm, communication delay is reduced, and finally, a dynamic tuning plate is called for data optimization, so that the problems that the individuation degree of a traditional network user figure is low, the data throughput is severely limited are effectively solved, and the user figure is more accurate. The above steps are specifically described below.
For step 1: user information is collected across sites using a local link and iteratively updated.
Specifically, step 1 is:
step 11: based on social media data, data gathering takes place, considering that nodes sharing most common friends among sites are more likely to be the same user. Using common friends in one site, the same heuristic is used across sites, where F (i, S) represents friend user i on site S.
Step 12: will already be at S 1 (S 2 ) The user of the medium map is denoted as M 1 (M 2 ) While the unmapped users are denoted as V S1 \M1(V S2 \M 2 ) The method is characterized by comprising the following steps:
step 13: a user is mapped to another user on the network according to the number of friends of the two users in the mapping. One user on each network has the most friends in the map. Since it is assumed that the two users represent the same person, they are added to the map. This process will continue until no further users are identified on both networksOr->)。
For step 2: and (3) utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in the database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm.
In step 2, the data mining algorithm is an Apriori algorithm.
The step 2 is specifically as follows:
step 21: according to the consistency requirement, the Apriori algorithm is used for calculating a frequent data item set according to the support degree of the frequent data item, and the data item and the access transaction are classified by using the consistency index CI. The step 21 specifically comprises the following steps:
step 211: using 50% relative support, database transactions are divided into read transaction set D r And write transaction set D w 。
Step 212: each mapper accepts as input database transactions and a set of data items, and generates an intermediate list of support counts for each data item.
Step 213: the combiner mobilizes and merges the results from the plurality of mappers and creates a set of key-value pairs for each data item, the aggregate value of each data item being used herein as a CI.
Step 214: the moderator compares the aggregate value as CI with a given minimum support threshold and outputs a final list of frequent data items.
Step 215: according to the algorithm, the final classification result is: when input D r At the time, the return value is freq r I.e. classified as a set of frequently read data items. When input D w At the time, the return value is freq w I.e. the collection of data items is updated frequently. For any data item x i ∈I,At the time, the return value is freq rw I.e. sets of data items are both read and updated frequently. The remaining data items that do not meet the minimum support threshold return an index of less frequentlyA collection of data items.
Step 216: classifying read-intensive transactions as T based on the above data item classifications ri Write-intensive transactions are categorized as T wi The set of read and write intensive transactions is categorized as T rwi The remaining transaction sets are classified as T rem 。
Step 22: in the hierarchical transaction model, a level of consistency is assigned to each transaction and its associated data items. Step 22 is specifically:
step 221: since frequently updated data items require strict consistency in a database transaction containing frequently read and frequently updated data items, frequently read data items require high availability, under this consensus, will have both read and write strength T rwi With which access is made to the data item freq rw To the strongest coherency level SR; read intensive transaction T ri And the data item freq accessed by it r Assigned to SI level; write intensive transaction T wi And the data item freq accessed by it w Assigned to NMSI level; remaining transactions T rem And accessing the corresponding data item index to the ASYNC level.
Step 222: based on the data items accessed by the transaction, the transaction manager will select the appropriate level of consistency. It invokes one of four resolvers following different coherency policies—sr resolver (SRR), SI resolver (SIR), NMSI resolver (NMSIR), ASYNC resolver (ASYNC) to execute transactions belonging to different coherency levels.
Step 223: transactions are assigned to their matching consistency levels by the driving of the following algorithm:
input: some database transaction T
Output invoking the appropriate parser
Step 23: strongly related data items are identified and associations between frequent item data sets are established. In particular, the method comprises the steps of,
step 231: a minimum confidence minconf=50% is set,
step 232:
for all non-empty subsets for which f is generated for each frequent item set f, there are non-empty subsetsWhen->When (I)>The strong association rule is established.
Step 233: when the strong association rule is satisfied, the data items associated with each other are placed on the same site.
For step 3: based on the collection, classification and storage of behavior information and data mining analysis, user portraits are carried out on data items which are mutually related and placed on the same site, and a visual model is built.
The step 3 is specifically as follows:
step 31: and (3) reducing the dimension of the classified data in the step (2), and extracting factors by adopting principal component analysis to obtain characteristic factors.
Step 32: and selecting a classical K-means clustering algorithm to cluster all data samples according to the extracted characteristic factors, and establishing the class number of the user group portraits. The number of clusters is recommended to be set in the range of 3-6, and the final cluster number is determined by combining discriminant analysis and Lambda value of Wilks.
Step 33: and drawing a visual user mark cloud by using a word group package in python, and intuitively displaying the user portrait clustering result obtained by the analysis. The size of each feature label is determined by the corresponding average of these user representations, the larger the font the more prominent the feature.
For step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading.
The step 4 is specifically as follows:
step 41: the achievable throughput U is defined as a function of the end system characteristics, network characteristics, data sets and external traffic load as attributes:
U=f(p o ,p d ,b,τ,f a ,n,cc,p,pp,l c ),
wherein, given source endpoint p o And target endpoint p d The link bandwidth is b, the round trip time is tau, and the average file size f a Number of files n, contention load transfer l c Setting u of parameters 1/4 fcc;p;ppg。
Step 42: the data transmission throughput is maximized using nearly optimal parameter values for network conditions and data sets, the optimization problem being:
the constraint is as follows: cc x p is less than or equal to N str . pp is less than or equal to P. T is less than or equal to b, wherein T s And t e Respectively a transmission start time and a transmission end time, N str And P is the maximum pipeline value allowed in the network.
Step 43: and calling a dynamic tuning module on the history transmission log, and storing the result in a key value storage.
Step 44: when a user starts a transmission process, a main process with two threads, namely a transmission thread and a dynamic tuning thread, is started, the transmission process collects current network information and sends the current network information to an offline analysis module, the offline analysis module returns algorithm initial parameter setting to start transferring, and the dynamic tuning thread periodically checks the network environment.
Step 45: when the dynamic optimization module detects low throughput, it sends the current network state offline analysis module and obtains new parameters as the current state. Step 45 is specifically:
step 451: enabling a three-hierarchy cluster log. As shown in fig. 3.
Step 4511: based on network and data, establishing layer-1 cluster by using network characteristics and terminal system characteristics, establishing layer-2 by conducting subdivision of layer-1 based on data set information, conducting layer-2 subdivision layer-3 based on external load, clustering group methods, normalizing log attributes, and using Euclidean distance.
Step 4512: the neighbor matrix of the initial cluster is calculated using a weighted-free arithmetic average algorithm and the two clusters are combined at a minimum distance.
Step 4513: the rows and columns of the neighbor matrix are updated with new clusters, the matrix is populated with new distance values, and the cycle repeats until all clusters merge into one cluster. Where the clustering accuracy depends on k of the appropriate number of clusters. In this work, we used the Calinski-Harabaz index (i.e., CH index) to identify the appropriate number of clusters. The CH index can be calculated as: wherein phi is inter For inter-cluster variation, phi intra The intra-cluster variation can be defined as the sum of Euclidean distances, i.e. +.>Wherein M is k Cluster center of cluster k +.>Is the mean of the points in cluster k, +.>Is the overall average value.
Step 452: on the premise that data transmission logs for similar transmission tasks are contained in each cluster, the achievable throughput of different parameters is modeled as a piecewise cubic spline function.
Step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given two dimensionsA set of discrete points in space { (pp) i T), i=0, 1 … N, using a piecewise cubic polynomial g i (pp) connection continuous point-to-point (pp) i ,T i ) And (pp) i+1 ,T i+1 )。
Step 4522: an interpolation function g (pp) =th is constructed, controlling the second derivative to be zero at the end points.
Step 4523: all the cubic polynomial blocks are defined as g i (pp)=a i,0 +a i,1 pp+a i,2 pp+a i,3 pp,
Step 4524: let the period boundary be g (pp) i+1 )=g(pp i ) Piecewise polynomial g i Coefficient a of (pp) i,j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g i (pp)=T i I= … N, N continuity constraints resulting in g (pp) are g i-1 (pp i )=T i =g i (pp i ) I= … N, resulting in (N-2) constraints.
Step 4525: additional continuity constraints are imposed on the second derivative:2 (N-2) constraints are obtained.
Step 4526: the boundary conditions for the relaxed spline are:the total number of constraints thus obtained according to the above steps is n+ (N-2) +2 (N-2) +2=4 (N-1).
Step 453: for parameter set upper bound, the parameter search space has a bounded integer field.
Step 4531: assuming beta as the upper bound of the parameter, the cubic spline surface function is expressed as f i :Where ψ= {1,2 … β }.
Step 4532: for each f k Performing a second partial derivative test, i.e. calculating f k Is a Hessian matrix of (c): j is the matrix of Lax.
Step 4533: calculate the corresponding { p, pp, cc } so that H k (p, pp, cc) is a negative-set matrix, yielding f k Is included.
Step 4535: take f= { F 1 ,…,f p All local maxima in the set to generate a surface maxima.
Step 46: the dynamic adjustment thread informs the transmission thread of parameter updating, the transmission thread uses new parameters to continue transmission, the external flow load change parameters are optimized during transmission, when the throughput is reduced, the current network condition is sent to the offline analysis module again, and the steps are circularly carried out to realize user model offline optimization and data throughput upgrading. An offline optimization model workflow diagram is shown in fig. 2.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be appreciated by persons skilled in the art that the above embodiments are not intended to limit the invention in any way, and that all technical solutions obtained by means of equivalent substitutions or equivalent transformations fall within the scope of the invention.
Claims (5)
1. A network user image drawing method based on a CECU system intelligent algorithm model is characterized by comprising the following steps:
step 1: collecting user information by using local link cross-site and carrying out iterative update information;
step 2: utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in a database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm;
step 3: based on the collection, classified storage and data mining analysis of behavior information, user portraits are carried out on data items which are placed on the same site and are related to each other, and a visual model is built;
step 4: selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading;
the step 1 specifically comprises the following steps:
step 11: based on social media data, data collection is carried out, common friends in one site are utilized, the same heuristic is used across sites, and F (i, S) represents that a friend user i is on site S;
step 12: will already be at S 1 (S 2 ) The user of the medium map is denoted as M 1 (M 2 ) While the unmapped users are denoted as V S1 \M 1 (V S2 \M 2 );
Step 13: mapping a certain user to another user on the network according to the number of friends of the two users in the mapping;
the step 2 is specifically as follows:
step 21: according to the consistency requirement, calculating a frequent data item set according to the support degree of the frequent data item by using an Apriori algorithm, and classifying the data item and the access transaction by using a consistency index CI;
step 22: in the hierarchical transaction model, a consistency level is allocated for each transaction and its associated data item;
step 23: identifying strongly related data items, and establishing association between frequent item data sets;
the step 21 specifically comprises the following steps:
step 211: using 50% relative support, database transactions are divided into read transaction set D r And write transaction set D w ;
Step 212: each mapper accepts as input database transactions and a set of data items, generating an intermediate list of support counts for each data item;
step 213: the combiner mobilizes and merges the results from the plurality of mappers and creates a set of key-value pairs for each data item, the aggregate value of each data item being used herein as a CI;
step 214: the speed reducer compares the aggregate value serving as the CI with a given minimum support threshold value and outputs a final frequent data item list;
step 215: according to the algorithm, the final classification result is: when input D r At the time, the return value is freq r I.e. classified as a set of frequently read data items, when input D w At the time, the return value is freq w I.e. the collection of data items is updated frequently, for any data item x i ∈I,At the time, the return value is freq rw I.e. the set of data items is read frequently as well as updated frequently, the remaining data items which do not meet the minimum support threshold value, the return value being infreq, i.e. the set of infrequent data items;
step 216: classifying read-intensive transactions as T based on the above data item classifications ri Write-intensive transactions are categorized as T wi The set of read and write intensive transactions is categorized as T rwi The remaining transaction sets are classified as T rem ;
The step 4 is specifically as follows:
step 41: the achievable throughput U is defined as a function of the end system characteristics, network characteristics, data sets and external traffic load as attributes:
U=f(p o ,p d ,b,τ,f a ,n,cc,p,pp,l c ),
wherein, given source endpoint p o And target endpoint p d The link bandwidth is b, the round trip time is tau, and the average file size f a Number of files n, contention load transfer l c ;
Step 42: the data transmission throughput is maximized using nearly optimal parameter values for network conditions and data sets, the optimization problem being:
the constraint is as follows: cc x p is less than or equal to N str The method comprises the steps of carrying out a first treatment on the surface of the pp is less than or equal to P; u is less than or equal to b, wherein t s And t e Respectively a transmission start time and a transmission end time, N str And P is the maximum pipeline value allowed in the network;
step 43: calling a dynamic tuning module on the history transmission log, and storing the result in a key value storage;
step 44: when a user starts a transmission process, starting a main process with two threads, namely a transmission thread and a dynamic tuning thread, collecting current network information in the transmission process and sending the current network information to an offline analysis module, returning algorithm initial parameter setting by the offline analysis module to start transferring, and periodically checking a network environment by the dynamic tuning thread;
step 45: when the dynamic optimization module detects low throughput, the dynamic optimization module sends the current network state to the offline analysis module and obtains new parameters as the current state;
step 46: the dynamic adjustment thread informs the transmission thread of parameter updating, the transmission thread uses new parameters to continue transmission, the external flow load change parameters are optimized during transmission, when the throughput is reduced, the current network condition is sent to the offline analysis module again, and the steps are circularly carried out to realize user model offline optimization and data throughput upgrading.
2. The network user image method based on the CECU architecture intelligent algorithm model as set forth in claim 1, wherein,
the step 3 is specifically as follows:
step 31: performing factor extraction on the classified data in the step 2 by adopting principal component analysis to obtain characteristic factors;
step 32: aiming at the extracted characteristic factors, a classical K-means clustering algorithm is selected to cluster all data samples, and the class number of the user group portraits is established;
step 33: and drawing a visual user mark cloud by using a wordcloud package in python, and intuitively displaying the user portrait clustering result obtained by the analysis.
3. The network user image method based on the CECU architecture intelligent algorithm model as set forth in claim 1, wherein,
step 45 is specifically:
step 451: enabling a three-hierarchy cluster log;
step 452: modeling the achievable throughput of different parameters as a piecewise cubic spline function on the premise that each cluster contains data transmission logs for similar transmission tasks;
step 453: setting an upper bound for the parameter, wherein the parameter search space has a bounded integer domain;
step 451 is specifically:
step 4511: establishing a layer-1 cluster by using network characteristics and terminal system characteristics based on the network and data, establishing a layer-2 by conducting subdivision of the layer-1 based on data set information, conducting layer-2 subdivision of the layer-3 based on external load, clustering a group method, normalizing log attributes and using Euclidean distance;
step 4512: calculating the adjacent matrix of the initial cluster by adopting a weighting-free arithmetic average algorithm, and combining the two clusters at a minimum distance;
step 4513: the rows and columns of the neighbor matrix are updated with new clusters, the matrix is populated with new distance values, and the cycle repeats until all clusters merge into one cluster.
4. The network user image method based on CECU system intelligent algorithm model according to claim 3, wherein,
step 452 specifically comprises:
step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given a set of discrete points { (pp) in two-dimensional space i T), i=0, 1 … N, using a piecewise cubic polynomial g i (pp) connection continuous point-to-point (pp) i ,T i ) And (pp) i+l ,T i+l );
Step 4522: constructing an interpolation function g (pp) =th, and controlling the second derivative to be zero at the end point;
step 4523: all the cubic polynomial blocks are defined as:
step 4524: let the period boundary be g (pp) i+1 )=g(pp i ) Piecewise polynomial g i Coefficient a of (pp) i,j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g i (pp)=T i I=1, …, N continuity constraints resulting in g (pp) are g i-1 (pp i )=T i =g i (pp i ) I= … N, resulting in (N-2) constraints;
step 4525: additional continuity constraints are imposed on the second derivative:obtaining 2 (N-2) constraint conditions;
step 4526: the boundary conditions for the relaxed spline are:the total number of constraints thus obtained according to the above steps is n+ (N-2) +2 (N-2) +2=4 (N-1).
5. The network user image method based on the CECU system intelligent algorithm model as set forth in claim 4, wherein,
step 453 is specifically:
step 4531: assuming beta as the upper bound of the parameter, representing the cubic spline surface function asWherein ψ= {1,2 … β };
step (a)4532: for each f k Performing a second partial derivative test, i.e. calculating f k Is a Hessian matrix of (c): j is a matrix of the Lax,
step 4533: calculate the corresponding { p, pp, cc } so that H k (p, pp, cc) is a negative-set matrix, yielding f k A set of local maxima in (a);
step 4535: take f= { F 1 ,…,f p All local maxima in the set to generate a surface maxima.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110260517.4A CN113052629B (en) | 2021-03-10 | 2021-03-10 | Network user image drawing method based on CECU system intelligent algorithm model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110260517.4A CN113052629B (en) | 2021-03-10 | 2021-03-10 | Network user image drawing method based on CECU system intelligent algorithm model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052629A CN113052629A (en) | 2021-06-29 |
CN113052629B true CN113052629B (en) | 2024-02-13 |
Family
ID=76510985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110260517.4A Active CN113052629B (en) | 2021-03-10 | 2021-03-10 | Network user image drawing method based on CECU system intelligent algorithm model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052629B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115567227B (en) * | 2022-12-02 | 2023-04-07 | 华南师范大学 | A method and system for identity authentication based on big data security |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6301575B1 (en) * | 1997-11-13 | 2001-10-09 | International Business Machines Corporation | Using object relational extensions for mining association rules |
WO2007048008A2 (en) * | 2005-10-21 | 2007-04-26 | Fair Isaac Corporation | Method and apparatus for retail data mining using pair-wise co-occurrence consistency |
CN102098175A (en) * | 2011-01-26 | 2011-06-15 | 浪潮通信信息系统有限公司 | Alarm association rule obtaining method of mobile internet |
CN106294715A (en) * | 2016-08-09 | 2017-01-04 | 中国地质大学(武汉) | A kind of association rule mining method based on attribute reduction and device |
-
2021
- 2021-03-10 CN CN202110260517.4A patent/CN113052629B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6301575B1 (en) * | 1997-11-13 | 2001-10-09 | International Business Machines Corporation | Using object relational extensions for mining association rules |
WO2007048008A2 (en) * | 2005-10-21 | 2007-04-26 | Fair Isaac Corporation | Method and apparatus for retail data mining using pair-wise co-occurrence consistency |
CN102098175A (en) * | 2011-01-26 | 2011-06-15 | 浪潮通信信息系统有限公司 | Alarm association rule obtaining method of mobile internet |
CN106294715A (en) * | 2016-08-09 | 2017-01-04 | 中国地质大学(武汉) | A kind of association rule mining method based on attribute reduction and device |
Non-Patent Citations (1)
Title |
---|
具有语义最小支持度的关联规则挖掘方法;张磊;夏士雄;周勇;牛强;;微电子学与计算机(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113052629A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Personalized federated learning on non-IID data via group-based meta-learning | |
Park et al. | Distributed data mining | |
CN103336790B (en) | Hadoop-based fast neighborhood rough set attribute reduction method | |
CN109255586B (en) | Online personalized recommendation method for e-government affairs handling | |
CN103336791B (en) | Hadoop-based fast rough set attribute reduction method | |
US20120072408A1 (en) | Method and system of prioritising operations | |
CN114399251B (en) | Cold-chain logistics recommendation method and device based on semantic web and cluster preference | |
CN103782309A (en) | Automatic data cleaning for machine learning classifiers | |
CN111881358B (en) | Object recommendation system, method and device, electronic equipment and storage medium | |
Pan et al. | Clustering of designers based on building information modeling event logs | |
CN105069122A (en) | Personalized recommendation method and recommendation apparatus based on user behaviors | |
CN107194672B (en) | A review assignment method that integrates academic expertise and social network | |
Ravikumar et al. | A new adaptive hybrid mutation black widow clustering based data partitioning for big data analysis | |
Feng et al. | Reinforcement routing on proximity graph for efficient recommendation | |
CN117194771B (en) | Dynamic knowledge graph service recommendation method for graph model characterization learning | |
CN117150416B (en) | A detection method, system, media and equipment for abnormal nodes in the industrial Internet | |
CN113228059A (en) | Cross-network-oriented representation learning algorithm | |
Sundarakumar et al. | A heuristic approach to improve the data processing in big data using enhanced Salp Swarm algorithm (ESSA) and MK-means algorithm | |
CN113052629B (en) | Network user image drawing method based on CECU system intelligent algorithm model | |
CN104063555B (en) | The user model modeling method intelligently distributed towards remote sensing information | |
CN112668633A (en) | Adaptive graph migration learning method based on fine granularity field | |
CN110119268B (en) | Workflow optimization method based on artificial intelligence | |
CN116882845A (en) | Scientific and technological achievements evaluation information system | |
CN115658979A (en) | Context-aware method, system and data access control method based on weighted GraphSAGE | |
CN110135747B (en) | Flow customization method based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |