CN115600642B

CN115600642B - Stream media-oriented decentralization federation learning method based on neighbor trust aggregation

Info

Publication number: CN115600642B
Application number: CN202211234598.1A
Authority: CN
Inventors: 袁博; 沈玉龙; 陈森霖; 胡凯
Original assignee: Nanjing Baituo Vision Technology Co ltd
Current assignee: Nanjing Baituo Vision Technology Co ltd
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2024-02-06
Anticipated expiration: 2042-10-10
Also published as: CN115600642A

Abstract

The invention discloses a decentralizing federation learning method based on neighbor trust aggregation for streaming media, which comprises the following steps that step 1, streaming media data collected by a client side are normalized to form a feature vector of a user; step 2, constructing a local model based on a CNN and a transducer network; selecting a characteristic vector irrelevant to time sequence from the characteristic vectors, and inputting the characteristic vector into a CNN network; selecting a characteristic vector related to time sequence from the characteristic vectors, and inputting the characteristic vectors into a transducer network; performing Concat operation on the characteristic information extracted by convolution and a result output by a transducer; and 3, training a global model by using a random walk model based on a trust mechanism and performing decentralization federal learning. The federal learning method is adopted to enable models among clients to be shared, and a peer-to-peer network technology is introduced, so that the networked computers do not depend on a centralized server.

Description

Stream media-oriented decentralization federation learning method based on neighbor trust aggregation

Technical Field

The invention relates to the field of federation learning of distributed computing, in particular to a decentralization federation learning method based on neighbor trust aggregation, which is applied to streaming media data.

Background

In the field of machine learning, some traditional machine learning models such as support vector machines, logistic regression, etc. are applied in specific scenarios such as house price prediction, etc. Because the data trained by the machine learning model comes from different clients or different institutions, the data sharing between different clients and different institutions is limited due to the existence of data security laws. Federal learning is designed to analyze data without touching the data. Federal learning is not just a machine learning method, but rather a business model. At present, in the fields of medical treatment, finance and the like, federal learning is applied to real life, such as health detection of wearable equipment, marketing recommendation of financial products and the like. For the field that the privacy of the user needs to be enhanced and protected, the model under the application scene is often trained by a federal learning method. Peer-to-Peer (P2P) technology is different from the traditional centralized server concept, and by reasonable network optimization and network coordination, computers participating in networking are converted into a centralized server computer, and each computer is a small but perfect network server while being used as a client. The computers of each node are in peer-to-peer relationship with the computers of other points, and no upper-level and lower-level relationship exists.

In some applications of federal learning, mainly based on some streaming media platforms, using a blockchain technology to perform federal learning to train a recommendation model, for example, patent No. 202111638487.2 proposes a personalized behavior recommendation method based on federal learning, performing personalized recommendation according to user characteristic behaviors in different regions, and storing data in each local model to perform centralized federal learning training; as in patent No. 202110521197.3, a federal learning-based collaborative online video edge caching method is proposed, which uses a plurality of users, a plurality of edge nodes and a central server, wherein a plurality of edge poles serve a plurality of mobile users in the coverage area of the users and are connected to the central server, and each edge node configures the edge server. In addition to the two types, the present federal learning is also applied to the financial and medical industries, such as patent No. 202110493396.8, which proposes a method, system and medium for identifying potential customers based on longitudinal federal learning; medical named entity recognition model training method, recognition method and federal learning system are proposed in patent No. 202210131792.0. The magic screen of this patent (CN 202121739511.7 is a flexible combined LCD intelligent display system) is a display information display system, which has user APP and terminal public display screen. The magic screen comprises a centralized network architecture of a server side and a terminal, but no data transmission is performed between the server side and the terminal, and the server side only plays a supporting role in operation and can be essentially regarded as a decentralised structure. The magic screen can be used for tasks such as user preference and action recognition, emotion learning and the like, and the task can recommend proper content according to the personalized requirements of the user. The terminal of the magic screen can form a database at the local end according to the preference of each user, and the recommendation is carried out according to the preference of the user. For example, chinese people like Chinese red color, while European people like holy white; eastern people like freshwater fish, european people like ocean products, and the like. When different users use the magic screen, the magic screen can be adjusted to a certain degree according to the different users, and then the recommendation of the content is performed.

In the existing federal learning system, a server-client architecture-based centralized model is adopted, training models of all clients are required to be uploaded to a server, storage pressure of the server is high, network transmission pressure is high, and unavoidable equipment resources including computing resources, communication resources and energy sources are consumed. Meanwhile, the server side is abnormal, so that the crash of the federal learning system can be directly caused, and the ongoing federal learning process is interrupted. Computer nodes under the P2P technology peer to each other, so each node can act as a server in the conventional mode and have a large amount of resource information at the same time, so each network node can provide necessary resources. It is also this one reason that networks under P2P technology exhibit unstructured distribution, while the individual network nodes are more decentralized. In short, since we can assume that, at the same time, if in a defined network, the more nodes they possess, the more resources they can provide, and thus the externalized performance characteristics in the network are particularly evident. Thus, the existing federal learning and P2P technology has the following disadvantages:

Problem 1. The traditional federal learning method needs a reliable centralized server, needs to consume a large amount of equipment resources, and if the centralized server fails, the whole system breaks down and cannot normally run.

The peer-to-peer network has no structure distribution, the large data volume can cause long calculation time, and meanwhile, the direct exchange of data between devices is based on the peer-to-peer network, so that the privacy security of users can not be protected.

Aiming at the two problems, the invention provides a stream media-oriented decentralization federation learning method based on neighbor aggregation.

For problem 1, we first propose a federal learning model based on decentralization, each time the aggregated clients proceed individually. Each client is set as a node, each node has a model, and a network architecture is built to allow the model of each node to run. And (3) performing model aggregation at the nodes receiving the models, and after repeated iteration, converging the models at each node to form an optimal model of the models of the clients participating in the aggregation. The decentralised federal learning model is built, and a P2P network is used for building, so that the equal status among all clients can be ensured, and smooth migration of the model can be ensured;

For problem 2, a local model is built in units of individual devices using federal learning techniques. The federal learning aggregation adopts the model trained by the respective client to carry out joint training, does not relate to the privacy data of the user, and can well solve the privacy safety problem of the problem 2. Meanwhile, the method is based on the framework of federal learning, and can well avoid the problem of overlarge data volume of the peer-to-peer network.

Disclosure of Invention

The invention provides a stream media-oriented decentralization federation learning method based on neighbor aggregation based on a CNN and a Transformer network, a model wandering, a peer-to-peer network, a trust network and a federation learning model, which is used for solving the dependence of federation learning on a reliable central server. The patent algorithm protects the privacy safety of users, and the centralized federal learning is omitted, so that a reliable central server is omitted, and the cost is relatively low.

Federal learning is used as a novel encryption distributed machine learning, so that the trust degree of users on the current artificial intelligence technology is improved. The federal learning model diagram is shown in figure 1. Federal learning aggregation has the precondition: each participant's data has a certain correlation, which includes target tasks, user IDs, feature variables, etc. The invention relates to a decentralised federal learning method, which is based on local client modeling and requires a great deal of training, so that the invention weights according to the performance of a model in the model walk training stage, and ensures better generalization of the final aggregated model performance. The federal average algorithm (Federated Averaging, fedAVg) used by the existing federal learning framework at the server side only performs average processing on parameters uploaded to the server side, and does not consider the difference problem among the models. In the patent of the invention, the user data of the magic screen platform are pushed, and the user information of each crowd has certain difference and has poor processing effect only by using average. Therefore, aiming at the problem, the invention adopts the decentralised federal learning, the model wandering and trust values are utilized to determine the effect of the global model, and the client weight distribution is carried out according to the data duty ratio of the participants, and the algorithm model diagram of the invention is shown in figure 2.

The invention discloses a stream media-oriented decentralization federation learning method based on neighbor trust aggregation, which comprises the following steps:

step 1, carrying out normalization processing on stream media data collected by a client to form a feature vector of a user;

step 2, constructing a local model based on a CNN and a transducer network;

selecting a characteristic vector irrelevant to time sequence from the characteristic vectors, inputting the characteristic vector into a CNN network, and reserving characteristic information proposed by the CNN as independent characteristic information;

selecting a characteristic vector related to time sequence from the characteristic vectors, and inputting the characteristic vectors into a transducer network; performing Concat operation on the characteristic information extracted by convolution and a result output by a transducer;

and 3, training a global model by using a random walk model based on a trust mechanism and performing decentralization federal learning.

Further, step 1, performing normalization processing on stream media data collected by a client to form a feature vector of a user; the method specifically comprises the following steps:

step 1.1, dividing local clients into five classes according to user age groups, wherein children f are under 16 years old ₁ Years 16 to 28 are apparent years f ₂ Age 28 to 45 as Zhuang year f ₃ 、45Age 65 to middle age f ₄ Age above 65 years old f ₅ . Based on the age classification of the users, further classification is performed according to the users as professional creators or users watching only video, so that the client of federal learning is classified into 10 types of clients.

Step 1.2, acquiring historical data (the system is recommended according to age groups) of user behavior characteristics of the magic screen APP, wherein the historical data contains 15 characteristics, and mapping the 15 characteristics into a vector form: x= { X ₁ ,x ₂ ,x ₃ ,…,x ₁₄ ,x ₁₅ }. where x ₁ An age characteristic representing a user; x is x ₂ A gender characteristic representing a user; x is x ₃ Representing preference characteristics of the user; x is x ₄ Comment features representing users; x is x ₅ Representing a feature of interest of the user; x is x ₆ Fan features representing users; x is x ₇ Representing video content characteristics of a user; x is x ₈ A praise feature representing the user; x is x ₉ A regional feature representing a user; x is x ₁₀ Representing a consumer characteristic of the user; x is x ₁₁ Representing the academic features of the user; x is x ₁₂ Representing the preference video duration characteristics of the user; x is x ₁₃ Representing live features of a user; x is x ₁₄ Representing an online time feature of a user; x is x ₁₅ Indicating the online time length characteristic of the user. And classifying the clients according to the user characteristics acquired by the clients in step 1.1.

Step 1.3, summarizing the data collected by all clients and dividing the data into a training set, a test set and a verification set III so as to better train and evaluate the global model more accurately, and D ₁ Data as training set, D ₂ As test set sum D ₃ As a verification set. Wherein the verification set D ₃ In each client it is used to verify the accuracy of the global model. The three data sets are 80% of the training set D ₁ 10% is test set D ₂ And the last 10% is verification set D ₃ 。

In theory, the data are not uniform in size and have large differences, so that the data need to be normalized.

In general, the learning efficiency of the deep learning algorithm is optimal when the input data is close to the "0" average value. For the collected data, maximum and minimum normalization is used, the data is mapped between [0,1], and the normalization function is as follows:

wherein X is _t Is the characteristic vector of the user which is not standardized at the moment t, X _max Is the maximum value of the user characteristic vector, X _min Is the minimum value of the user feature vector,is a matrix of normalized user feature vectors at time t.

Step 1.4, constructing an input matrix by the 15 features,is the data input matrix after preprocessing.

Further, the converter network comprises an Input Block, an Encoder Block, a Decode Block and an Output Block; the Output end of the Input Block is respectively connected with the Encoder Block and the Decode Block, and the Output end of the Encoder Block is connected with the Decode Block and outputs the Decode Block to the Output Block.

The Encoder Block part consists of N identical networks, specifically: multi-Head Self-Attention (MSA) and Feed-Forward Network (FFN), each sub-Network layer adds Add & Norm layers for residual connection and normalization operations;

the Add & Nor layer sums and normalizes the input and output of the Multi-Head addition layer, then passes to the FFN layer, and finally performs an Add & Nor process again

The transducer network is formulated as

Where Source is a timing related feature in the feature vector, similarity (·) represents a similarity function, lx represents the length of the input Source, and query represents an element obtained in a given target.

The specific calculation process of the Attention mechanism can be summarized into two stages: the first stage is to calculate the weight coefficient according to the query and key; and in the second stage, the value is weighted and summed according to the weight coefficient.

Further, step 3, training a global model based on a trust mechanism random walk model and decentralizing federal learning, specifically includes the following steps:

step 3.1, defining a decentralised federal learning trust network, giving a client U ₀ A trust network of source nodes;

let U ₀ For a client source node in federal learning, a client U is composed of a connected client and a client connected with the connected client ₀ A trust network that is a source node; defining U as client U ₀ Comprises a client U ₁ ,…,U _n All clients behind are client U ₀ Is indirectly associated with client v, including client v ₁ ,…,v _m ；

When the resource quantity of each client node is the same, a DHT technology is adopted, a unique identifier is added for each resource node, a mapping relation between the resource and the node IP is established, an annular P2P structure is set, and from any one resource node to the last, all resource nodes obtain a global model cut, namely model migration is completed.

Step 3.2, based on the trust random walk model, completing TopN recommendation, thereby obtaining n and source clients U ₀ The client related to the user performs training of the common model;

given target user U ₀ Respectively, directly associated with the client U by the trust network of (a) ₁ ,…,U _n Starting at trusted subnet G _i ＝<U _i ,TU _i >In (a)In an indirect client, a trust-based random walk model is utilized, according to which a source user U ₀ The interest similarity between the clients is trained by a common model to finish TopN recommendation, wherein the TopN refers to obtaining N clients U with the source client ₀ Is used for training the common model.

Step 3.3, judging whether the client U exists ₀ Potentially associated client V of (a) _i The method comprises the steps of carrying out a first treatment on the surface of the If V is present _i And (3) performing a k+1 times random walk model, and then performing training of a common model.

Further, the step 3.3 specifically includes: according to the users v and U ₀ Model similarity ISim (u) ₀ V), predicting potential relevant clients, and joining target relevant clients u ₀ Is a set of prediction possible associated withAnd completing the recommendation of the potential client side once until the termination condition of the random walk is met. This random walk is repeated to ensure accuracy and coverage of the algorithm. And finally, ordering the obtained users according to the similarity of interests with the target users to finish TopN recommendation.

Further, when the kth-1 step of the random walk algorithm is on user u and the kth step is on its associated client user v, the stop walk probability is

Further, the iterative model of each client is w _locals An initial per-client local model w _local Marked as w according to the number of the client _i,local And iterating for T times to achieve the global best, namely stopping training.

Furthermore, precision, coverage and F-Measure indexes are adopted to evaluate the recommendation accuracy and coverage rate. Wherein:

wherein: n (N) _tp For recommending source client U ₀ The number of associated recommended clients; l is the number of recommended association clients.

Wherein B is _u Centralized source client U for testing ₀ The number of associated recommended objects, which represents the source client U ₀ The associated recommendation associates a probability that the client can be recommended. Can F is ready ₁ -measure is defined as:

further, define X _u To select user v as the random variable for the potential fan, then:

thus, the random walk algorithm is run on user u with a probability of 1-phi _u,v,k When the running is continued, the client v epsilon all of the running _U Is algorithmically selected as source user U ₀ The probability of potential vermicelli is

P(Y _u0,u ＝v)＝(1-φ _u,v,k )P(X _u ＝v) (13)

Further, the random walk model is given as a target client U ₀ From source user U ₀ Is a direct association client U of (a) _i And starting, traversing all clients of the trust sub-network. When the kth step-1 walks to the client u, whether the kth step of the algorithm walks to the user Uo client is related to the trust value of the client v and the client u. I.e. there are two options for the model at this time: (1) with probability phi _u,v,k Terminating the random walk and returning to the target user U selected in steps 1 to k-1 ₀ The potential association client of the random walk is completed; (2) with probability of 1-phi _u,v,k And (3) carrying out a (k+1) th step of random walk along the concerned edge to a next layer of user v associated client set, and selecting potential associated clients according to the model similarity between the current user and the source user. When the random walk algorithm reaches the user u, if the probability is 1-phi _u,v,k Continuing the walk, a user v will be selected from the set of directly related clients of user u, where φ is a probability function, to cause the algorithm to proceed. Let S _ω To select a random variable of v, the probability that user v is selected is related to its trust value for user ω, i.e

Wherein Tr is _u,v Is the trust value of user v for user u. The above process is repeated, and the algorithm continues to walk in the model trust network until the algorithm is terminated or the maximum walk level is reached.

Further, trust value definition, let U e U, v e fol _u The trust value TR of user v for u can be defined based on comment, forwarding and mention behaviors as:

(1) Let RT _u,v The number of times user u is forwarded for user v, then the trust value based on the forwarding behavior:

(2) Make CM _u,v For the number of comments of user v to user u, the trust value based on the comment behavior is:

(3) Let ME _u,v For user v to mention the number of times user u is referred to in posting, forwarding, or commenting on a work, then based on the mention behaviorTrust value:

(4) The trust values of 3 online behaviors are integrated, and the trust value of a user v to u is defined as follows:

if Tr in _rt (u,v)、Tr _cm (u, v) and Tr _me (u, v) is equal to 0, let it be equal toThe beneficial effects are that: the federal learning method is adopted to enable models among all clients to be shared, and data are analyzed on the premise that the data are not touched; and Peer-to-Peer (P2P) technology is introduced, and computers participating in networking are converted into computers which do not depend on a centralized server through reasonable network optimization and network coordination. .

Drawings

FIG. 1 is a schematic diagram of conventional federal learning;

FIG. 2 is a diagram of decentralised federal learning;

FIG. 3 is a diagram of the local model CNN+ transducer;

FIG. 4 is a diagram of a transducer network architecture;

FIG. 5 is an Attention mechanism diagram;

FIG. 6 is a Self-Attention mechanism diagram;

FIG. 7 is a Multi Self-Attention mechanism diagram;

FIG. 8 is a diagram of a ring hierarchy network;

fig. 9 is a P2P network diagram.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Examples: the method is based on a magic screen APP for embodiment introduction and example analysis. The invention provides a stream media-oriented decentralization federation learning method based on neighbor aggregation based on a CNN and a transducer network, a model wandering, a peer-to-peer network, a trust network and a federation learning model, which is used for solving the dependence of federation learning on a reliable central server. The patent algorithm protects the privacy safety of users, and the centralized federal learning is omitted, so that a reliable central server is omitted, and the cost is relatively low. The magic screen of the invention is a display information display system, and is provided with a user APP and a terminal disclosure display screen. The magic screen comprises a centralized network architecture of a server side and a terminal, but no data transmission is performed between the server side and the terminal, and the server side only plays a supporting role in operation and can be essentially regarded as a decentralised structure. The magic screen can be used for tasks such as user preference and action recognition, emotion learning and the like, and the task can recommend proper content according to the personalized requirements of the user. The terminal of the magic screen can form a database at the local end according to the preference of each user, and the recommendation is carried out according to the preference of the user. For example, chinese people like Chinese red color, while European people like holy white; eastern people like freshwater fish, european people like ocean products, and the like. When different users use the magic screen, the magic screen can be adjusted to a certain degree according to the different users, and then the recommendation of the content is performed.

The patent algorithm model can learn some characteristics of different users using APP according to ID division of different users on the basis of protecting user privacy, and perform interaction and recommendation between the users. Federal learning is used as a novel distributed machine learning, the privacy safety of users is guaranteed through a distributed encryption training technology, and the trust degree of the users on the current artificial intelligence technology is improved. The federal learning model diagram is shown in figure 1. Furthermore federal learning has the precondition: each participant's data has a certain correlation, which includes target tasks, user IDs, feature variables, etc. According to the correlation requirement of the data, the parameters of the patent model also have certain correlation, so that a model with better generalization performance can be trained. Under the federal learning framework, the local clients participating in the joint training can well protect the privacy of each participant. The federal average algorithm (Federated Averaging, fedAVg) used by the existing federal learning framework at the server side only performs average processing on parameters uploaded to the server side, and does not consider the difference problem among the models. The invention relates to a behavior feature recommendation algorithm, which performs model migration according to a trust value, improves original average processing, greatly increases recommendation efficiency of behavior features, does not need a server to participate in operation, and can save operation cost. The algorithm model diagram of the patent is shown in figure 2.

The implementation steps of the invention are three steps:

federal learning is used as a novel encryption distributed machine learning, so that the trust degree of users on the current artificial intelligence technology is improved. The federal learning model diagram is shown in figure 1. Federal learning aggregation has the precondition: each participant's data has a certain correlation, which includes target tasks, user IDs, feature variables, etc. The invention relates to a decentralised federal learning method, which is based on local client modeling and requires a great deal of training, so that the invention weights according to the performance of a model in the model walk training stage, and ensures better generalization of the final aggregated model performance. The federal average algorithm (Federated Averaging, fedAVg) used by the existing federal learning framework at the server side only performs average processing on parameters uploaded to the server side, and does not consider the difference problem among the models. In the patent of the invention, the user data of the magic screen platform are pushed, and the user information of each crowd has certain difference and has poor processing effect only by using average. Therefore, in order to solve the problem, the original federal average algorithm is improved to a federal weight average algorithm at the server side, and weight distribution is performed according to the data duty ratio of the participants, and an algorithm model diagram of the patent is shown in fig. 2.

The specific implementation steps are as follows:

s1: preprocessing of magic screen APP user data

The data based on the invention is recommended by the fan of the magic screen APP user, the favorites of different users are different, the use time of the APP is different, and the favorites of the product are different. The magic screen is a display information display system, and is provided with a user APP and a terminal disclosure display screen. The magic screen comprises a centralized network architecture of a server side and a terminal, but no data transmission is performed between the server side and the terminal, and the server side only plays a supporting role in operation and can be essentially regarded as a decentralised structure. The magic screen can be used for tasks such as user preference and action recognition, emotion learning and the like, and the task can recommend proper content according to the personalized requirements of the user. The terminal of the magic screen can form a database at the local end according to the preference of each user, and the recommendation is carried out according to the preference of the user. And a certain associated user is required to interact between the creator of the video and the interaction of the vermicelli so as to ensure the continuous creation of the video by the creator. Meanwhile, the vermicelli with the same preference can directly recommend the same type of creator to each other, so that the preference type of each user is very important. The user information is generated in 24 hours, and the user information is cached at the local end. The information of the user mainly comprises the following categories: the method comprises the following specific preprocessing steps of:

1-1: the invention aims at dividing local clients in age groups, and the division is automatically divided according to real name registration; the data processing of the ages is divided into five parts of children f according to real-name data ₁ (under 16 years old), teenager f ₂ (16-28 years), zhuang nian f ₃ (28 to 45 years), middle-aged f ₄ (45 to 65 years old) and elderly f ₅ (over 65 years old). Based on age division, magic screen users are divided into professional creators and users watching videos only, and clients for federal learning are divided into 10 types of clients.

1-2: acquiring historical data (the system is recommended according to age groups) of the behavior characteristics of a user using the magic screen APP, wherein the historical data contains 15 characteristics and is mappedAnd (3) injecting into a vector form: x= { X ₁ ,x ₂ ,x ₃ ,…,x ₁₄ ,x ₁₅ }. where x ₁ An age characteristic representing a user; x is x ₂ A gender characteristic representing a user; x is x ₃ Representing preference characteristics of the user; x is x ₄ Comment features representing users; x is x ₅ Representing a feature of interest of the user; x is x ₆ Fan features representing users; x is x ₇ Representing video content characteristics of a user; x is x ₈ A praise feature representing the user; x is x ₉ A regional feature representing a user; x is x ₁₀ Representing a consumer characteristic of the user; x is x ₁₁ Representing the academic features of the user; x is x ₁₂ Representing the preference video duration characteristics of the user; x is x ₁₃ Representing live features of a user; x is x ₁₄ Representing an online time feature of a user; x is x ₁₅ Indicating the online time length characteristic of the user. The data features are classified according to the step 1-1, and in theory, the data are not uniform in size and have large differences, so that the data need to be normalized.

1-3: after the patent of the invention collects data, three components of a training set, a test set and a verification set are used for better training and more accurate evaluation, D is ₁ Data as training set, D ₂ As test set sum D ₃ As a verification set. Wherein the verification set D ₃ In each client it is used to verify the accuracy of the global model. The three data sets are 80% of the training set D ₁ 10% is test set D ₂ And the last 10% is verification set D ₃ . In general, the learning efficiency of the deep learning algorithm is optimal when the input data is close to the "0" average value. Mapping data to [0,1 ] using maximum minimum normalization for collected data]Between, the normalization function is as follows:

1-4: after step 1-3, all personalized features are all subjected to data preprocessing, the 15 features are constructed into an input matrix,is the data input matrix after preprocessing.

S2: establishing local model based on CNN and Transformer network

Based on the preprocessing of the data in the first step, the features in the first step are used as input to be input into a local network model, and in the second step, the invention needs to extract the features of the data. The CNN and the Transformer are adopted as feature extraction, so that the data volume related to the CNN network is large, the data feature information can be well extracted as an Encoder of the Transformer, and the Transformer is originally obtained in natural language, so that the text information processing method has a good effect. Meanwhile, the transducer network can fully consider the influence and the relation between time sequences, so that potential interests of users can be deeply mined, and the recommendation effect of the trained model is obviously improved compared with that of the traditional network. Feature extraction divides data into two parts, one part being a timing feature and the other part being a non-timing feature. The transducer network has a good key feature extraction effect on time sequence features, focuses on time sequence information, and can enable a trained model to be better. A diagram of the local model network model herein is given as shown in fig. 3.

2-1: acquiring the characteristic vector of the step 1-2 and the time sequence related characteristic by utilizing convolution operation, wherein X is the characteristic vector X ₁₁ 、x ₁₂ 、x ₁₃ 、x ₁₄ And x ₁₅ The invention can relate to a 1*1 convolution kernel in order to reduce the calculation amount in training a network model, and select another 10 eigenvectors, namely X ' with eigenvectors X changed into new eigenvectors X ', wherein the correlation degree of the information with time sequence is large '＝{x ₁ ,x ₂ ,x ₃ ,…,x ₁₀ And the information proposed by the CNN is reserved as independent characteristic information, and the CNN characteristic extraction carries out characteristic extraction with the step length of 1 according to 1-dimensional convolution. Wherein x is ₁₁ 、x ₁₂ 、x ₁₃ 、x ₁₄ And x ₁₅ The information about the timing is input into the transducer network. Assuming that the convolution kernel size is 1*1, when the step size is 1 and d is 1, that is, the number of the filling "0" weights is 0, the convolution formula is as follows:

and carrying out Concat operation on the convolution extracted characteristic information and the result output by the transducer.

2-2: the time-series related feature vectors are input into the transducer network, and the time-series based feature information needs to take full advantage of the correlation between time and other parameters. The transducer solves the problems of gradient disappearance and gradient explosion of the traditional cyclic neural network, and can retain longer time information. The transform model is divided into four blocks, namely an Input Block, a Decoder Block and an Output Block, wherein the most critical is a second part of the Input Block and a third part of the Decoder Block. The transducer is characterized by a multi-headed self-attention mechanism between the Encoder Block and the Decode Block, and an internal multi-headed self-attention mechanism between the Encoder Block and the Decode Block. Wherein the Encoder Block portion consists of N identical networks, each network layer being divided into two portions: multi-Head Self-Attention (MSA) and Feed-Forward Network (FFN), each sub-Network layer adds residual connection and normalization operations. MSA As described above, FFN is a fully connected layer composition of nonlinear transformation. The Decoder Block part is composed of N identical networks, and the whole structure is the same as that of the Encoder Block, but the input of the Decoder Block part comes from two aspects. The Add & Nor layer sums and normalizes the input and Output of the Multi-Head Attention layer, then transmits to the FFN layer, finally carries out Add & Nor processing again, outputs to the 4 th module Output Block, and outputs all the extracted features of the model. The transducer mechanism is shown in FIG. 4:

2-3: the computation of the transducer is mainly a self-attention mechanism and a multi-head self-attention mechanism, wherein the self-attention mechanism is a special form of the self-attention mechanism, and the patent begins with the attention mechanism. The Attention mechanism can be regarded as a process of assuming that the elements in the input source are formed by a series of < key, value > data pairs, obtaining the query of a certain element in a given target, and then carrying out weighted summation on the value by calculating the weight coefficient of the query and the value corresponding to each key, wherein the query and the key are used for calculating the weight coefficient corresponding to the value, and finally obtaining the output Attention value can be expressed as a formula (3):

where similarity (·) represents the similarity function, lx represents the length of the input Source. In which data is entered, where the input source is x ₁₁ 、x ₁₂ 、x ₁₃ 、x ₁₄ And x ₁₅ Timing related features. The specific calculation process of the Attention mechanism can be summarized into two stages: the first stage is to calculate the weight coefficient according to the query and key; and in the second stage, the value is weighted and summed according to the weight coefficient. Wherein the first stage is divided into two steps:

step one, calculating the similarity or correlation of the query and the key;

And step two, carrying out normalization processing on the original score of the step one. The calculation of the Attention is thus as shown in fig. 5:

on the basis of fig. 5, three calculation steps of the Attention are described on the mathematical expression:

(1) The similarity between the query and each key is calculated to obtain a weight, and there are various ways of calculating the similarity between the query and the key, such as three ways of concat, dot and general in formula (4):

wherein Q represents query, T represents the transpose of the matrix, K represents key, and W is a weight coefficient.

(2) The weights were normalized using a softmax (·) function as shown in equation (5):

where softmax (·) is the normalization function, a _i Is the weight value after the ith normalization.

(3) And carrying out weighted summation on the weight and the corresponding key value to obtain the final Attention, as shown in a formula (6):

the value of the Attention for the query can be obtained by the three-stage calculation.

Self-attention mechanism is a special form of attention mechanism in which query=key=value, because query=key=value, three weight matrices are designed for better feature extraction: w (W) ^q 、W ^K 、W ^v Three new vector Query ', key ', value ' calculation formulas are obtained as shown in (7):

solving the value of the attention for each Key by using each Query, and finally obtaining a mathematical operation formula of a self-attention mechanism as shown in (8):

wherein T is transposed, d _k The number of columns of the K matrix, i.e. the dimension of the vector, is the scaling factor, Q. A schematic diagram of the self-attention mechanism is shown in fig. 6:

the Multi-Head Self-Attention (MSA) mechanism is actually a perfection of the Attention mechanism, which mainly uses a plurality of different subspaces to perform independent calculation, finally outputs subspace calculation results, and splices the subspace calculation results together, and compared with the Attention mechanism, the final result can consider the relevance of different layers, so that the output characteristics can consider more layers of information. MSA is equivalent to being formed using a combination of multiple self-attention mechanisms. The single self-attention mechanism Q, K, V performs a single-dimensional calculation, and the multi-head self-attention mechanism performs model learning with multiple heads at a time. Under the premise of not sharing parameters, the MSA carries out linear transformation on Q, K, V, carries out scaling dot product attention through mapping of a parameter matrix, and repeats the process for h times, finally splices the results, and maps one head at a time, namely a so-called multi-head self-attention mechanism. MSA performs mapping for multiple times, and mapping transformation parameters W are different each time, wherein W can be regarded as a matrix of W= [ W ] ^q ,W ^K ,W ^v ]. Q, K, V obtained by each linear transformation mapping carries out attention calculation, the result of each operation becomes a head, a multi-head self-attention mechanism carries out Concat operation on the final result to obtain the final result, and the final result is expressed as formulas (9) and (10) by using a mathematical form:

MultiHead(Q,K,V)＝Concat(head ₁ ,head ₂ ,…head _h ,)W ^o (9)

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V ) (10)

wherein concat (·) represents a concatenation operation, W ^o Is a weight matrix. The multi-headed self-attention mechanism can obtain more characteristic information than a single attention mechanism, which is also a transducer [16 ]]Is an important component of the composition. The multi-headed attentiveness mechanism is shown in fig. 7.

2-4: and (3) performing Concat on the CNN feature extraction and the feature extracted by the transducer to obtain a local model, wherein the local network model is shown in a figure 3, and the construction of the local model is completed.

S3 decentralizing federal learning network construction

The decentralised federal learning does not require a reliable server, and can save the architecture cost of the service system. The existing federal learning cannot achieve optimal performance under multiple scenes, data in different scenes often have different characteristics and different distributions in real life, and a global model with strong universality does not exist. In order to solve the problem, the invention provides a stream media-oriented decentralization federation learning method based on neighbor aggregation, and a trust mechanism is added in the decentralization federation learning method, so that clients participating in aggregation can mutually perform a common training model. The method provided by the invention can well solve the problems of overhead and attacked of the centralized federal learning server, and is more suitable for actual life application scenes.

3-1: decentralizing federal learning trust network definition, let U ₀ For a client source node in federal learning, i.e. source node user, a client U comprising a connected client and a client connected to the connected client ₀ A trust network for a source node user (target client). Order theUser client U as source node ₀ If the set of fos represents the meaning of the set, then the trust network can be defined as a directed graph g=in the graph neural network<U,TU>. Wherein u= { U ₀ ,U ₁ ,…,U _n -is the set of all users in the trust network, < + >>TU is Trust client U, which is a Trust network client, for all edge sets with concerns. The ring network has a very complex topology, and in view of research problems, the ring trust network is defined as U ₀ Hierarchical ring mesh for source userConstructing a structure. As shown in fig. 8, a magic screen is incorporated for illustration. Each node in the graph represents 1 client, each line represents the connection trust relationship among the clients, and the weight omega of the edge _u,v Representing the weight of user v on the u trust value. The global model parameter is w _global The local model parameter is w _local . For convenience of research, define layer 1 client U ₁ ,…,U _n Equal to the client U ₀ While layer 2 client v ₁ ,…,v _m All clients behind the same are client U ₀ Indirect vermicelli, indirect client vermicelli and U ₀ The interest similarity is s.

3-2: client trust recommendation problem description, given target user U ₀ The magic screen APP trust network is respectively used for directly vermicelli U ₁ ,…,U _n Starting at trusted subnet G _i ＝<U _i ,TU _i >In (a)Using a trust-based random walk model, in an indirect fan, according to which it is associated with the source user U ₀ The interest similarity between the clients is trained by a common model to finish TopN recommendation, so that n clients U with the source are obtained ₀ And (3) training the common model by using the related vermicelli of the users.

3-3: a target client user U is given by a random walk model ₀ Its trust network g=is constructed according to definition 3-1<U,TU>From U ₀ Vermicelli(s)And starting, executing a random walk algorithm in the trust sub-network, and recommending relevant clients to perform model training. Random walk algorithm slave source user U ₀ Is a direct association client u of (a) _i And starting, traversing all clients of the trust sub-network. Without loss of generality, when the kth step-1 walks to user u, whether the kth step of the algorithm walks to the directly associated client of user u, and the client->In relation to the trust value of user u. I.e. there are two options for the model at this time: (1) with probability phi _u,v,k Terminating the random walk and returning to the target user U selected in steps 1 to k-1 ₀ The potential vermicelli of the random walk is completed; (2) with probability of 1-phi _u,v,k And (3) carrying out a (k+1) th step of random walk along the attention edge to the next layer of user v fan collection, and selecting a potential associated client according to the model similarity between the current user and the source user. When the random walk algorithm reaches the user u, if the probability is 1-phi _u,v,k When the walk is continued, a user v is selected from the direct fan set of users u, so that the algorithm is continued, wherein phi is a probability function. Let S _ω To select a random variable of v, the probability that user v is selected is related to its trust value for user ω, i.e

3-4: the topology of the peer-to-peer network P2P directly influences how resources flow to users, with different topologies having different network performance. The network topology structure is divided into four types of centralized P2P network, full distributed unstructured P2P network, semi-distributed P2P network and full distributed structured P2P network, and the four types are shown in figure 9. The topological structure of the fully distributed structured P2P network is the same as that of the fully distributed unstructured P2P network, but a DHT technology is adopted, unique identification is added for each resource node, and a mapping relation between the resource and the node IP is established. The nodes can be precisely positioned, and the problem of communication congestion caused by resource searching is solved. The invention sets a ring P2P structure based on the invention, and from one resource node to the last, all resource nodes get the global model cut-off, namely the model wandering is completed.

3-5: associated client vermicelli prediction, at source user U ₀ In a network of associated clients of (1) a random walk algorithm is performed on user u with a probability of 1-phi _u,v,k When the user continues to walk, the source user U is predicted in the fan set of the user U ₀ Is a potentially relevant vermicelli. As previously described, each user v ε -all in the fan set for user u _U0 With source user U ₀ Similarity of interests ISim (u) ₀ V) probability of selection as a potential client. Definition X _u To select user v as the random variable for the potential fan, then:

P(Y _u0,u ＝v)＝(1-φ _u,v, k)P(X _u ＝v) (13)

3-6: trust value definition, let U e U, v e all _u The trust value TR of user v for u can be defined based on comment, forwarding and mention behaviors as:

(1) Let RT _u, The number of times user u is forwarded for user v, then the trust value based on the forwarding behavior:

(3) Let ME _u,v For user v to mention the number of times user u is posting a work, forwarding, or commenting, then based on the trust value of the mention behavior:

If Tr in _rt (u,v)、Tr _cm (u, v) and Tr _me (u, v) is equal to 0, let it be equal to

3-7: client model similarity, data type of associated client fan and source client U ₀ Is defined as a client-side similarity model. In addition, the attention of the user v to u is related to the similarity of the data types of the user v and the user v, and the higher the similarity is, the higher the similarity is. For user U, its data type and source client U ₀ Of the same type. The invention is based on a magic screen user as an example, and extracts keywords according to a natural language processing method, so that the data content is exemplified by word vectors here:

k _u ＝{(e ₁ ,ω ₁ ),(e ₂ ,ω ₂ ),…(e _m ,ω _m )} (18)

wherein: e, e _i Is the ith keyword, such as' the action is good, the harmony is good, and the like; omega _i Giving weight to the weight of the ith keyword, specifically to the number of times of the keywords to be mentioned; m is the number of keywords, and specifically refers to the number of the mentioned keywords. The weight of the keyword can be calculated by adopting a classical TF-IDF formula, namely:

after obtaining the interest keyword vectors of the client users u and v, the interest similarity can be calculated by using cosine similarity between the vectors, namely:

3-8: associated vermicelli recommendation based on TRW model, and recommendation process is performed on target user u ₀ In the peer-to-peer network of vermicelli, the random walk model walks by using the depth-first algorithm of the graph, according to users v and u ₀ Model similarity ISim (u) ₀ V), predicting potential relevant vermicelli and adding target relevant client u ₀ Is a set of prediction possible associated withAnd completing the recommendation of the potential client side once until the termination condition of the random walk is met. This random walk is repeated to ensure accuracy and coverage of the algorithm. And finally, ordering the obtained users according to the similarity of interests with the target users to finish TopN recommendation.

3-9: random walk termination condition, when predicting potential attention users on a magic screen fan network based on a trust-based random walk model, the termination probability phi of the kth step of the algorithm at a user node u _u,v,k Related to the trust value of the user U on the concerned object, the user and the source client U follow random roaming on the trust network ₀ The trust chain is gradually prolonged when the distance of the trust chain is larger and larger, and the trust chain is towards the source client U ₀ Will be lower and therefore the probability of termination of the algorithm will increase gradually. In one random roaming, each user is opposite to the target user U ₀ Is gradually reduced. According to the six-degree space theory, the maximum number of steps is defined as 6. To sum up, when the kth-1 step of the random walk algorithm is on user u and the kth step is on its fan user v, the terminate walk probability is

3-10: model iteration step, in the above 3-1 to 3-9, the invention provides that the iteration model of each client is w _locals An initial per-client local model w _local From the local model iteration in step 2, denoted as w according to the client's label difference _i,local And (3) performing iteration T times according to the model walking of 3-1 to 3-9, and stopping training when the overall best is achieved.

3-11: the global model evaluation method is based on trust, and the random walk model vermicelli recommendation is TopN recommendation, has bipartite property, and is suitable for evaluating recommendation accuracy and coverage rate by adopting indexes such as Precision, coverage, F-Measure and the like. Wherein:

Wherein B is _u Centralized source client U for testing ₀ The number of associated recommended objects, which represents the source client U ₀ The associated recommended association fan can be recommended. Can F is ready ₁ -measure is defined as:

according to the liveness and the number of users of different magic screens APP, the value setting according to the existing general numerical value Precision is not lower than 90%, the value setting of Coverage is not lower than 80%, and the value setting of F is not lower than 50%, so that the whole content of the invention patent is completed.

3-12: the final effect of the embodiment is that in the APP terminal of the magic screen user, the accuracy of the recommended content according to the individual characteristics of the user reaches 95%, the favorite degree of the user on the recommended content reaches 75%, and the experimental result shows that the method provided by the invention has relatively good executable performance.

The foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The decentralized federation learning method based on neighbor trust aggregation for streaming media is characterized by comprising the following steps:

step 2, constructing a local model based on a CNN and a transducer network;

selecting a characteristic vector irrelevant to time sequence from the characteristic vectors, and inputting the characteristic vector into a CNN network;

step 3, training a global model based on a random walk model of a trust mechanism and decentralizing federal learning, wherein the method specifically comprises the following steps of;

Step 3.2, based on the trust random walk model, completing TopN recommendation, thereby obtaining n and source clients U ₀ The user of (2) directly related clients, training a common model;

2. The method for decentralized federation learning based on neighbor trust aggregation for streaming media according to claim 1, wherein step 3.3 specifically comprises: according to the users v and U ₀ Model similarity ISim (u) ₀ V), predicting potential relevant clients, and joining target relevant clients u ₀ Is a set of prediction possible associated withUntil the termination condition of random walk is met, the recommendation of the potential client is completed once; to ensure the accuracy and coverage of the algorithm, repeating the random walk; and finally, ordering the obtained users according to the similarity of interests with the target users to finish TopN recommendation.

3. The method for decentralized federal learning based on neighbor trust aggregation for streaming media according to claim 1, wherein when the kth-1 step of the random walk algorithm is on user u and the kth step is on its associated client user v, the termination walk probability is

The iterative model of each client is w _locals Initial per-client localModel w _local Marked as w according to the number of the client _i,local And iterating for T times to achieve the global best, namely stopping training.

4. The method for decentralized federal learning based on neighbor trust aggregation for streaming media according to claim 1, wherein Precision, coverage and F-Measure indexes are adopted to evaluate recommendation accuracy and coverage rate; wherein:

Wherein: n (N) _tp For recommending source client U ₀ The number of associated recommended clients; l is the number of recommended associated clients;

wherein B is _u Centralized source client U for testing ₀ The number of associated recommended objects, which represents the source client U ₀ The probability that the associated recommendation associated client can be recommended; can F is ready ₁ -measure is defined as:

5. the method for decentralized federal learning based on neighbor trust aggregation for streaming media according to claim 1, wherein X is defined _u To select user v as the random variable for the potential fan, then:

thus, random walk algorithmWith probability 1-phi on user u _u，v，k When the running is continued, the client v epsilon all of the running _U Is algorithmically selected as source user U ₀ The probability of potential vermicelli is

P(Y _u0，u ＝v)＝(1-φ _u，v，k )P(X _u ＝v) (13)。

6. The method for decentralized federal learning based on neighbor trust aggregation for streaming media according to claim 1, wherein the random walk model is given to a target client U ₀ From source user U ₀ Is a direct association client U of (a) _i Starting, traversing all clients of the trust sub-network; when the kth step-1 walks to the client U, whether the kth step of the algorithm walks to the user U ₀ A client, which is related to the trust value of the client v and the client u; i.e. there are two options for the model at this time: (1) with probability phi _u，v，k Terminating the random walk and returning to the target user U selected in steps 1 to k-1 ₀ The potential association client of the random walk is completed; (2) with probability of 1-phi _u，v，k The k+1 step random walk is carried out along the concerned edge to the next layer of user v associated client set, and potential associated clients are selected according to the model similarity between the current user and the source user; when the random walk algorithm reaches the user u, if the probability is 1-phi _u，v，k When the walk is continued, a user v is selected from the direct association client set of the user u so that the algorithm is continued, wherein phi is a probability function; let S _ω To select a random variable of v, the probability that user v is selected is related to its trust value for user ω, i.e

Wherein Tr is _u，v A trust value of the user v to the user u; the above process is repeated, and the algorithm continues to walk in the model trust network until the algorithm is terminated or the maximum walk level is reached.

7. The method for decentralized federation learning based on neighbor trust aggregation for streaming media according to claim 6, wherein the trust value is defined such that U e U, v e all _u The trust value TR of user v for u can be defined based on comment, forwarding and mention behaviors as:

(1) Let RT _u，v The number of times user u is forwarded for user v, then the trust value based on the forwarding behavior:

(2) Make CM _u，v For the number of comments of user v to user u, the trust value based on the comment behavior is:

(3) Let ME _u，v For user v to mention the number of times user u is posting a work, forwarding, or commenting, then based on the trust value of the mention behavior:

if Tr in _rt (u，v)、Tr _cm (u, v) and Tr _me (u, v) is equal to 0, let it be equal to

8. The method for decentralized federation learning based on neighbor trust aggregation for streaming media according to claim 1, wherein the Transformer network comprises Input Block, encoder Block, decoder Block and Output Block; the Output end of the Input Block is respectively connected with the Encoder Block and the Decode Block, the Output end of the Encoder Block is connected with the Decode Block, and the Decode Block is Output to the Output Block;

the Encoder Block part consists of N identical networks, specifically: multi-Head Self-Attention (MSA) and Feed-Forward Network, each sub-Network layer adds an Add & Norm layer for residual connection and normalization operations;

the Add & Nor layer sums and normalizes the inputs and outputs of the Multi-Head Attention layer, then passes to the FFN layer, and finally performs the Add & Nor processing again.

9. The method for decentralized federal learning based on neighbor trust aggregation for streaming media according to claim 1, wherein the Transformer network is formulated as