CN113052629B

CN113052629B - Network user image drawing method based on CECU system intelligent algorithm model

Info

Publication number: CN113052629B
Application number: CN202110260517.4A
Authority: CN
Inventors: 李瑶; 张俞佳; 黄雯静; 琚春华; 鲍福光
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2024-02-13
Anticipated expiration: 2041-03-10
Also published as: CN113052629A

Abstract

The invention discloses a network user image drawing method based on a CECU system intelligent algorithm model, which comprises the following steps: collecting user information by using local link cross-site and carrying out iterative update information; copying a plurality of consistency levels of each data item in the database by using the hierarchical transaction model, and dividing the data items into different categories; user portrayal is carried out on data items which are placed on the same site and are related to each other, and a visual model is built; and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading. According to the invention, local link cross-site user information is collected and iteratively updated, data mining classification is carried out by means of consistency index CI evaluation and Apriori support algorithm, communication delay is reduced, and dynamic tuning plates are called for data optimization, so that the problems of low individuation degree of traditional network user images and severely limited data throughput are effectively solved.

Description

Network user image drawing method based on CECU system intelligent algorithm model

Technical Field

The invention belongs to the technical field of network information, and particularly relates to a network user image drawing method based on a CECU system intelligent algorithm model.

Background

Under the promotion of rapid development of internet information technology and social information transformation, a big data age becomes a necessary trend of our modern society, a diversified network platform aims at mining and analyzing information data of all aspects of users, and user portraits are based on data mining and analysis of each network user behavior under specific use situations, so that a tag set for describing user attributes and behaviors of the user attributes is established. The meaning of the user portrayal method is not only to build a personal data information base for network users, but also to provide personalized services for network service providers.

Disclosure of Invention

The invention provides a network user portrait method based on a CECU system intelligent algorithm model, which adopts the following technical scheme:

a network user image drawing method based on a CECU system intelligent algorithm model comprises the following steps:

step 1: collecting user information by using local link cross-site and carrying out iterative update information;

step 2: utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in a database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm;

step 3: based on the collection, classified storage and data mining analysis of behavior information, user portraits are carried out on data items which are placed on the same site and are related to each other, and a visual model is built;

step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading.

Further, in step 2, the data mining algorithm is an Apriori algorithm.

Further, the step 2 specifically comprises:

step 21: according to the consistency requirement, calculating a frequent data item set according to the support degree of the frequent data item by using an Apriori algorithm, and classifying the data item and the access transaction by using a consistency index CI;

step 22: in the hierarchical transaction model, a consistency level is allocated for each transaction and its associated data item;

step 23: strongly related data items are identified and associations between frequent item data sets are established.

Further, step 21 specifically includes:

step 211: using 50% relative support, database transactions are divided into read transaction set D _r And write transaction set D _w ；

Step 212: each mapper accepts as input database transactions and a set of data items, generating an intermediate list of support counts for each data item;

step 213: the combiner mobilizes and merges the results from the plurality of mappers and creates a set of key-value pairs for each data item, the aggregate value of each data item being used herein as a CI;

step 214: the speed reducer compares the aggregate value serving as the CI with a given minimum support threshold value and outputs a final frequent data item list;

step 215: according to the algorithm, the final classification result is: when input D _r At the time, the return value is freq _r I.e. classified as a set of frequently read data items, when input D _w At the time, the return value is freq _w I.e. the collection of data items is updated frequently, for any data item x _i ∈I,At the time, the return value is freq _rw I.e. the set of data items is read frequently as well as updated frequently, the remaining data items which do not meet the minimum support threshold value, the return value being infreq, i.e. the set of infrequent data items;

step 216: classifying read-intensive transactions as T based on the above data item classifications _ri Write-intensive transactions are categorized as T _wi The set of read and write intensive transactions is categorized as T _rwi The remaining transaction sets are classified as T _rem 。

Further, the step 3 specifically comprises:

step 31: performing factor extraction on the classified data in the step 2 by adopting principal component analysis to obtain characteristic factors;

step 32: aiming at the extracted characteristic factors, a classical K-means clustering algorithm is selected to cluster all data samples, and the class number of the user group portraits is established;

step 33: and drawing a visual user mark cloud by using a word group package in python, and intuitively displaying the user portrait clustering result obtained by the analysis.

Further, the step 4 specifically comprises:

step 41: the achievable throughput U is defined as a function of the end system characteristics, network characteristics, data sets and external traffic load as attributes:

U＝f(p _o ,p _d ，b,τ，f _a ，n，cc，p，pp，l _c )，

wherein, given source endpoint p _o And target endpoint p _d The link bandwidth is b, the round trip time is tau, and the average file size f _a Number of files n, contention load transfer l _c Setting u of parameters ^1/4 fcc；p；ppg；

Step 42: the data transmission throughput is maximized using nearly optimal parameter values for network conditions and data sets, the optimization problem being:

the constraint is as follows: cc x p is less than or equal to N _str The method comprises the steps of carrying out a first treatment on the surface of the pp is less than or equal to P; t is less than or equal to b, wherein T _s And t _e Respectively a transmission start time and a transmission end time, N _str And P is the maximum pipeline value allowed in the network;

step 43: calling a dynamic tuning module on the history transmission log, and storing the result in a key value storage;

step 44: when a user starts a transmission process, starting a main process with two threads, namely a transmission thread and a dynamic tuning thread, collecting current network information in the transmission process and sending the current network information to an offline analysis module, returning algorithm initial parameter setting by the offline analysis module to start transferring, and periodically checking a network environment by the dynamic tuning thread;

step 45: when the dynamic optimization module detects low throughput, the dynamic optimization module sends the current network state to the offline analysis module and obtains new parameters as the current state;

step 46: the dynamic adjustment thread informs the transmission thread of parameter updating, the transmission thread uses new parameters to continue transmission, the external flow load change parameters are optimized during transmission, when the throughput is reduced, the current network condition is sent to the offline analysis module again, and the steps are circularly carried out to realize user model offline optimization and data throughput upgrading.

Further, step 45 specifically includes:

step 451: enabling a three-hierarchy cluster log;

step 452: modeling the achievable throughput of different parameters as a piecewise cubic spline function on the premise that each cluster contains data transmission logs for similar transmission tasks;

step 453: for parameter set upper bound, the parameter search space has a bounded integer field.

Further, step 451 is specifically:

step 4511: establishing a layer-1 cluster by using network characteristics and terminal system characteristics based on the network and data, establishing a layer-2 by conducting subdivision of the layer-1 based on data set information, conducting layer-2 subdivision of the layer-3 based on external load, clustering a group method, normalizing log attributes and using Euclidean distance;

step 4512: calculating the adjacent matrix of the initial cluster by adopting a weighting-free arithmetic average algorithm, and combining the two clusters at a minimum distance;

step 4513: the rows and columns of the neighbor matrix are updated with new clusters, the matrix is populated with new distance values, and the cycle repeats until all clusters merge into one cluster.

Further, step 452 specifically includes:

step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given a set of discrete points { (pp) in two-dimensional space _i T), i=0, 1 … N, using a piecewise cubic polynomial g _i (pp) connection continuous point-to-point (pp) _i ,T _i ) And (pp) _i+1 ，T _i+1 )；

Step 4522: constructing an interpolation function g (pp) =th, and controlling the second derivative to be zero at the end point;

step 4523: all the cubic polynomial blocks are defined as g _i (pp)＝a _i,0 +a _i,1 pp+a _i,2 pp+a _i,3 pp，

Step 4524: let the period boundary be g (pp) _i+1 )＝g(pp _i ) Piecewise polynomial g _i Coefficient a of (pp) _i，j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g _i (pp)＝T _i I= … N, N continuity constraints resulting in g (pp) are g _i-1 (pp _i )＝T _i ＝g _i (pp _i ) I= … N, resulting in (N-2) constraints;

step 4525: additional continuity constraints are imposed on the second derivative:obtaining 2 (N-2) constraint conditions;

step 4526: the boundary conditions for the relaxed spline are:the total number of constraints thus obtained according to the above steps is n+ (N-2) +2 (N-2) +2=4 (N-1).

Further, step 453 is specifically:

step 4531: assuming beta as the upper bound of the parameter, the cubic spline surface function is expressed as f _i :Wherein ψ= {1,2 … β };

step 4532: for each f _k Performing a second partial derivative test, i.e. calculating f _k Is a Hessian matrix of (c): j is a Laratio matrix;

step 4533: calculate the corresponding { p, pp, cc } so that H _k (p, pp, cc) is a negative-set matrix, yielding f _k A set of local maxima in (a);

step 4535: take f= { F ₁ ,…,f _p All local maxima in the set to generate a surface maxima.

The network user image method based on the CECU system intelligent algorithm model has the advantages that the user information is collected across sites by using local links and is comprehensively updated by iteration, the data mining classification is carried out by establishing a hierarchical transaction model through consistency index CI evaluation and Apriori support algorithm, communication delay is reduced, and finally, a dynamic tuning plate is called for data optimization, so that the problems of low individuation degree and severely limited data throughput of traditional network user images are effectively solved, and the user images are more accurate.

Drawings

FIG. 1 is a schematic diagram of a network user image method based on a CECU architecture intelligent algorithm model of the present invention;

FIG. 2 is a flowchart of the user portrayal model offline optimization workflow of the present invention;

FIG. 3 is an illustration of a hierarchical model of cluster logs of the present invention.

Detailed Description

The invention is described in detail below with reference to the drawings and the specific embodiments.

As shown in fig. 1, the network user image method based on the intelligent algorithm model of the CECU system (user model offline optimization system) of the present invention includes the following steps: step 1: user information is collected across sites using a local link and iteratively updated. Step 2: and (3) utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in the database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm. Step 3: based on the collection, classification and storage of behavior information and data mining analysis, user portraits are carried out on data items which are mutually related and placed on the same site, and a visual model is built. Step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading. The user information is comprehensively achieved by collecting user information through local links and updating the information through iterative parties, the data mining classification is carried out through establishing a hierarchical transaction model by means of consistency index CI (Consistency Index ) evaluation and Apriori support algorithm, communication delay is reduced, and finally, a dynamic tuning plate is called for data optimization, so that the problems that the individuation degree of a traditional network user figure is low, the data throughput is severely limited are effectively solved, and the user figure is more accurate. The above steps are specifically described below.

For step 1: user information is collected across sites using a local link and iteratively updated.

Specifically, step 1 is:

step 11: based on social media data, data gathering takes place, considering that nodes sharing most common friends among sites are more likely to be the same user. Using common friends in one site, the same heuristic is used across sites, where F (i, S) represents friend user i on site S.

Step 12: will already be at S ₁ (S ₂ ) The user of the medium map is denoted as M ₁ (M ₂ ) While the unmapped users are denoted as V _S1 \M1(V _S2 \M ₂ ) The method is characterized by comprising the following steps:

step 13: a user is mapped to another user on the network according to the number of friends of the two users in the mapping. One user on each network has the most friends in the map. Since it is assumed that the two users represent the same person, they are added to the map. This process will continue until no further users are identified on both networksOr->)。

For step 2: and (3) utilizing a hierarchical transaction model to copy a plurality of consistency levels of each data item in the database on a large scale, and dividing the data items into different categories according to consistency requirements through a data mining algorithm.

In step 2, the data mining algorithm is an Apriori algorithm.

The step 2 is specifically as follows:

step 21: according to the consistency requirement, the Apriori algorithm is used for calculating a frequent data item set according to the support degree of the frequent data item, and the data item and the access transaction are classified by using the consistency index CI. The step 21 specifically comprises the following steps:

step 211: using 50% relative support, database transactions are divided into read transaction set D _r And write transaction set D _w 。

Step 212: each mapper accepts as input database transactions and a set of data items, and generates an intermediate list of support counts for each data item.

Step 213: the combiner mobilizes and merges the results from the plurality of mappers and creates a set of key-value pairs for each data item, the aggregate value of each data item being used herein as a CI.

Step 214: the moderator compares the aggregate value as CI with a given minimum support threshold and outputs a final list of frequent data items.

Step 215: according to the algorithm, the final classification result is: when input D _r At the time, the return value is freq _r I.e. classified as a set of frequently read data items. When input D _w At the time, the return value is freq _w I.e. the collection of data items is updated frequently. For any data item x _i ∈I,At the time, the return value is freq _rw I.e. sets of data items are both read and updated frequently. The remaining data items that do not meet the minimum support threshold return an index of less frequentlyA collection of data items.

Step 22: in the hierarchical transaction model, a level of consistency is assigned to each transaction and its associated data items. Step 22 is specifically:

step 221: since frequently updated data items require strict consistency in a database transaction containing frequently read and frequently updated data items, frequently read data items require high availability, under this consensus, will have both read and write strength T _rwi With which access is made to the data item freq _rw To the strongest coherency level SR; read intensive transaction T _ri And the data item freq accessed by it _r Assigned to SI level; write intensive transaction T _wi And the data item freq accessed by it _w Assigned to NMSI level; remaining transactions T _rem And accessing the corresponding data item index to the ASYNC level.

Step 222: based on the data items accessed by the transaction, the transaction manager will select the appropriate level of consistency. It invokes one of four resolvers following different coherency policies—sr resolver (SRR), SI resolver (SIR), NMSI resolver (NMSIR), ASYNC resolver (ASYNC) to execute transactions belonging to different coherency levels.

Step 223: transactions are assigned to their matching consistency levels by the driving of the following algorithm:

input: some database transaction T

Output invoking the appropriate parser

Step 23: strongly related data items are identified and associations between frequent item data sets are established. In particular, the method comprises the steps of,

step 231: a minimum confidence minconf=50% is set,

step 232:

for all non-empty subsets for which f is generated for each frequent item set f, there are non-empty subsetsWhen->When (I)>The strong association rule is established.

Step 233: when the strong association rule is satisfied, the data items associated with each other are placed on the same site.

For step 3: based on the collection, classification and storage of behavior information and data mining analysis, user portraits are carried out on data items which are mutually related and placed on the same site, and a visual model is built.

The step 3 is specifically as follows:

step 31: and (3) reducing the dimension of the classified data in the step (2), and extracting factors by adopting principal component analysis to obtain characteristic factors.

Step 32: and selecting a classical K-means clustering algorithm to cluster all data samples according to the extracted characteristic factors, and establishing the class number of the user group portraits. The number of clusters is recommended to be set in the range of 3-6, and the final cluster number is determined by combining discriminant analysis and Lambda value of Wilks.

Step 33: and drawing a visual user mark cloud by using a word group package in python, and intuitively displaying the user portrait clustering result obtained by the analysis. The size of each feature label is determined by the corresponding average of these user representations, the larger the font the more prominent the feature.

For step 4: and selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading.

The step 4 is specifically as follows:

U＝f(p _o ，p _d ，b，τ,f _a ,n，cc，p，pp，l _c )，

wherein, given source endpoint p _o And target endpoint p _d The link bandwidth is b, the round trip time is tau, and the average file size f _a Number of files n, contention load transfer l _c Setting u of parameters ^1/4 fcc；p；ppg。

the constraint is as follows: cc x p is less than or equal to N _str . pp is less than or equal to P. T is less than or equal to b, wherein T _s And t _e Respectively a transmission start time and a transmission end time, N _str And P is the maximum pipeline value allowed in the network.

Step 43: and calling a dynamic tuning module on the history transmission log, and storing the result in a key value storage.

Step 44: when a user starts a transmission process, a main process with two threads, namely a transmission thread and a dynamic tuning thread, is started, the transmission process collects current network information and sends the current network information to an offline analysis module, the offline analysis module returns algorithm initial parameter setting to start transferring, and the dynamic tuning thread periodically checks the network environment.

Step 45: when the dynamic optimization module detects low throughput, it sends the current network state offline analysis module and obtains new parameters as the current state. Step 45 is specifically:

step 451: enabling a three-hierarchy cluster log. As shown in fig. 3.

Step 4511: based on network and data, establishing layer-1 cluster by using network characteristics and terminal system characteristics, establishing layer-2 by conducting subdivision of layer-1 based on data set information, conducting layer-2 subdivision layer-3 based on external load, clustering group methods, normalizing log attributes, and using Euclidean distance.

Step 4512: the neighbor matrix of the initial cluster is calculated using a weighted-free arithmetic average algorithm and the two clusters are combined at a minimum distance.

Step 4513: the rows and columns of the neighbor matrix are updated with new clusters, the matrix is populated with new distance values, and the cycle repeats until all clusters merge into one cluster. Where the clustering accuracy depends on k of the appropriate number of clusters. In this work, we used the Calinski-Harabaz index (i.e., CH index) to identify the appropriate number of clusters. The CH index can be calculated as: wherein phi is _inter For inter-cluster variation, phi _intra The intra-cluster variation can be defined as the sum of Euclidean distances, i.e. +.>Wherein M is _k Cluster center of cluster k +.>Is the mean of the points in cluster k, +.>Is the overall average value.

Step 452: on the premise that data transmission logs for similar transmission tasks are contained in each cluster, the achievable throughput of different parameters is modeled as a piecewise cubic spline function.

Step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given two dimensionsA set of discrete points in space { (pp) _i T), i=0, 1 … N, using a piecewise cubic polynomial g _i (pp) connection continuous point-to-point (pp) _i ，T _i ) And (pp) _i+1 ，T _i+1 )。

Step 4522: an interpolation function g (pp) =th is constructed, controlling the second derivative to be zero at the end points.

Step 4523: all the cubic polynomial blocks are defined as g _i (pp)＝a _i，0 +a _i，1 pp+a _i,2 pp+a _i，3 pp，

Step 4524: let the period boundary be g (pp) _i+1 )＝g(pp _i ) Piecewise polynomial g _i Coefficient a of (pp) _i,j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g _i (pp)＝T _i I= … N, N continuity constraints resulting in g (pp) are g _i-1 (pp _i )＝T _i ＝g _i (pp _i ) I= … N, resulting in (N-2) constraints.

Step 4525: additional continuity constraints are imposed on the second derivative:2 (N-2) constraints are obtained.

Step 4531: assuming beta as the upper bound of the parameter, the cubic spline surface function is expressed as f _i :Where ψ= {1,2 … β }.

Step 4532: for each f _k Performing a second partial derivative test, i.e. calculating f _k Is a Hessian matrix of (c): j is the matrix of Lax.

Step 4533: calculate the corresponding { p, pp, cc } so that H _k (p, pp, cc) is a negative-set matrix, yielding f _k Is included.

Step 4535: take f= { F ₁ ，…，f _p All local maxima in the set to generate a surface maxima.

Step 46: the dynamic adjustment thread informs the transmission thread of parameter updating, the transmission thread uses new parameters to continue transmission, the external flow load change parameters are optimized during transmission, when the throughput is reduced, the current network condition is sent to the offline analysis module again, and the steps are circularly carried out to realize user model offline optimization and data throughput upgrading. An offline optimization model workflow diagram is shown in fig. 2.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be appreciated by persons skilled in the art that the above embodiments are not intended to limit the invention in any way, and that all technical solutions obtained by means of equivalent substitutions or equivalent transformations fall within the scope of the invention.

Claims

1. A network user image drawing method based on a CECU system intelligent algorithm model is characterized by comprising the following steps:

step 4: selecting a throughput parameter U, and starting a dynamic tuning module to perform user model offline optimization and data throughput upgrading;

the step 1 specifically comprises the following steps:

step 11: based on social media data, data collection is carried out, common friends in one site are utilized, the same heuristic is used across sites, and F (i, S) represents that a friend user i is on site S;

step 12: will already be at S ₁ (S ₂ ) The user of the medium map is denoted as M ₁ (M ₂ ) While the unmapped users are denoted as V _S1 \M ₁ (V _S2 \M ₂ )；

Step 13: mapping a certain user to another user on the network according to the number of friends of the two users in the mapping;

the step 2 is specifically as follows:

step 23: identifying strongly related data items, and establishing association between frequent item data sets;

the step 21 specifically comprises the following steps:

step 216: classifying read-intensive transactions as T based on the above data item classifications _ri Write-intensive transactions are categorized as T _wi The set of read and write intensive transactions is categorized as T _rwi The remaining transaction sets are classified as T _rem ；

The step 4 is specifically as follows:

U＝f(p _o ,p _d ,b,τ,f _a ,n,cc,p,pp,l _c )，

wherein, given source endpoint p _o And target endpoint p _d The link bandwidth is b, the round trip time is tau, and the average file size f _a Number of files n, contention load transfer l _c ；

the constraint is as follows: cc x p is less than or equal to N _str The method comprises the steps of carrying out a first treatment on the surface of the pp is less than or equal to P; u is less than or equal to b, wherein t _s And t _e Respectively a transmission start time and a transmission end time, N _str And P is the maximum pipeline value allowed in the network;

2. The network user image method based on the CECU architecture intelligent algorithm model as set forth in claim 1, wherein,

the step 3 is specifically as follows:

step 33: and drawing a visual user mark cloud by using a wordcloud package in python, and intuitively displaying the user portrait clustering result obtained by the analysis.

3. The network user image method based on the CECU architecture intelligent algorithm model as set forth in claim 1, wherein,

step 45 is specifically:

step 451: enabling a three-hierarchy cluster log;

step 453: setting an upper bound for the parameter, wherein the parameter search space has a bounded integer domain;

step 451 is specifically:

4. The network user image method based on CECU system intelligent algorithm model according to claim 3, wherein,

step 452 specifically comprises:

step 4521: constructing a two-dimensional cubic spline interpolation of g (pp) =t, given a set of discrete points { (pp) in two-dimensional space _i T), i=0, 1 … N, using a piecewise cubic polynomial g _i (pp) connection continuous point-to-point (pp) _i ,T _i ) And (pp) _i+l ,T _i+l )；

step 4523: all the cubic polynomial blocks are defined as:

step 4524: let the period boundary be g (pp) _i+1 )＝g(pp _i ) Piecewise polynomial g _i Coefficient a of (pp) _i,j Where j=1, 2,3, contains 4 (N-1) unknowns, i.e. g _i (pp)＝T _i I=1, …, N continuity constraints resulting in g (pp) are g _i-1 (pp _i )＝T _i ＝g _i (pp _i ) I= … N, resulting in (N-2) constraints;

5. The network user image method based on the CECU system intelligent algorithm model as set forth in claim 4, wherein,

step 453 is specifically:

step 4531: assuming beta as the upper bound of the parameter, representing the cubic spline surface function asWherein ψ= {1,2 … β };

step (a)4532: for each f _k Performing a second partial derivative test, i.e. calculating f _k Is a Hessian matrix of (c): j is a matrix of the Lax,