CN115185715A - Case popularity diffusion processing method based on social network information - Google Patents

Case popularity diffusion processing method based on social network information Download PDF

Info

Publication number
CN115185715A
CN115185715A CN202211107987.8A CN202211107987A CN115185715A CN 115185715 A CN115185715 A CN 115185715A CN 202211107987 A CN202211107987 A CN 202211107987A CN 115185715 A CN115185715 A CN 115185715A
Authority
CN
China
Prior art keywords
diffusion
node
nodes
information
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211107987.8A
Other languages
Chinese (zh)
Inventor
董卓达
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huayun Zhongsheng Technology Co ltd
Original Assignee
Shenzhen Huayun Zhongsheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huayun Zhongsheng Technology Co ltd filed Critical Shenzhen Huayun Zhongsheng Technology Co ltd
Priority to CN202211107987.8A priority Critical patent/CN115185715A/en
Publication of CN115185715A publication Critical patent/CN115185715A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to the field of information processing, in particular to a case popularity diffusion processing method based on social network case information. The method comprises the steps of crawling social network case information data, collecting data forwarded and commented in the diffusion process, completing diffusion processing of the social network case information data on the basis, and improving monitoring of diffusion information in a hot case information internet.

Description

Case popularity diffusion processing method based on social network information
Technical Field
The invention relates to the field of information processing, in particular to a case popularity diffusion processing method based on social network information.
Background
As knowledge information grows in social networking, the diffusion phenomenon of social networking case information becomes more and more obvious in the face of widely spread information. In the process of the public welfare actions, case clues and related cases involved in the public welfare actions process information such as influences on the public, and the information becomes important reference basis for handling the cases in the public welfare actions.
In the face of social network case information data, how to complete the diffusion phenomenon becomes a research hotspot in the related field of the current internet law, the indexes can become the guide of the influence of the litigation of public welfare cases, and can provide case handling support for subsequent retrieval cases and the like.
Currently, digital case resources are generally already in a certain scale, and therefore, the research on social network case information data diffusion is of great importance. In practical situations, one or more pieces of information are sent from multiple message sources, and in the diffusion process of the multi-source information, the message is often published on a network by multiple users at the same time and then spread in the network. The method is characterized in that information is abstracted into events according to the characteristics of information diffusion in the social network, a large-scale event network with mass information is constructed, a distributed thought is introduced, and the detection requirement of the mass information is met, but the method does not relate to the important factor of the incidence relation of user nodes; the method is based on the research of a diffusion method of social network information data, the cooperation and competition relationship among nodes are researched from a message propagation mechanism, the influence of time on message diffusion is considered, and the calculation process is complex.
Disclosure of Invention
In order to solve one of the problems, the invention provides a case popularity diffusion processing method based on social network information, and particularly effectively completes the processing of case information data diffusion of the social network by using an association rule method.
The method comprises the steps of constructing a diffusion monitoring model, and modeling diffusion of information through a directed network G = (U, E), wherein U is a set of all nodes, and E (⊂ U × U) is a set of all arcs; for each arc
Figure 983981DEST_PATH_IMAGE001
There are two parameters:
Figure 567278DEST_PATH_IMAGE002
give a
Figure 174977DEST_PATH_IMAGE003
At the time of day
Figure 227247DEST_PATH_IMAGE004
To transmit information to
Figure 211383DEST_PATH_IMAGE005
Wherein 0 is<
Figure 649187DEST_PATH_IMAGE002
<1, and
Figure 162208DEST_PATH_IMAGE006
wherein
Figure 967353DEST_PATH_IMAGE006
>0;
Figure 224022DEST_PATH_IMAGE002
Referred to as the diffusion function,
Figure 781911DEST_PATH_IMAGE006
referred to as time delay parameters;
Figure 731412DEST_PATH_IMAGE002
is a function of the node, edge and exchanged content characteristics;
computing node
Figure 758274DEST_PATH_IMAGE007
At the time of day
Figure 818634DEST_PATH_IMAGE008
To node
Figure 231030DEST_PATH_IMAGE009
Sending a message
Figure 413749DEST_PATH_IMAGE010
The probability of (d); the 13 interpretable features we describe below are values between 0 and 1 calculated from past information diffusion traces.
The probability is social, topical and temporalA function of nodes, edges and topic features, wherein social dimension features: rate at which each node issues messages
Figure 662328DEST_PATH_IMAGE011
,
Figure 526379DEST_PATH_IMAGE012
(ii) a Two groups of nodes
Figure 809593DEST_PATH_IMAGE007
And
Figure 350164DEST_PATH_IMAGE009
and H: (
Figure 86039DEST_PATH_IMAGE013
) A Jaccard similarity coefficient of interaction; ratio of directed to undirected messages issued by each node
Figure 753781DEST_PATH_IMAGE014
,
Figure 625922DEST_PATH_IMAGE015
(ii) a Rate mR for each node to receive the target message (m:)
Figure 337395DEST_PATH_IMAGE007
),mR(
Figure 560566DEST_PATH_IMAGE009
);
Subject dimension characteristics: interest of each user in information
Figure 766419DEST_PATH_IMAGE016
Figure 758646DEST_PATH_IMAGE017
Time dimension characteristics: distribution of activities per user during a day, as a non-parametric function of vector storage
Figure 906599DEST_PATH_IMAGE018
,
Figure 617066DEST_PATH_IMAGE019
Probability of diffusion
Figure 157769DEST_PATH_IMAGE020
Given by the following equation, where V is the correlation vector of the feature:
Figure 473344DEST_PATH_IMAGE021
estimating data describing the propagation mode of the past information in the network by using Bayesian Logistic regression to obtain
Figure 74090DEST_PATH_IMAGE022
And (4) the coefficient.
The method further comprises: performing feature detection according to a diffusion graph of the diffusion event; the input parameters of the feature detection are a diffusion graph and feature coefficients of diffusion events
Figure 521120DEST_PATH_IMAGE023
(ii) a The output of the algorithm is the diffusion signature of the event and the APL value.
The feature detection specifically includes:
1) Setting characteristic coefficients
Figure 68776DEST_PATH_IMAGE023
2) Counting the degree of each node in the diffusion graph according to the adjacency list structure in the diffusion graph;
3) Counting the number of multi-branch nodes and two-branch nodes in the graph, wherein the multi-branch nodes are the node degrees more than 2, and otherwise, the two-branch nodes are the node degrees;
4) Calculating the ratio of the star nodes, and classifying the diffusion event characteristics by comparing characteristic coefficients;
5) Calculating the APL value of each connected branch of the diffusion diagram;
6) Calculate the APL value of the whole diffusion map, i.e. the value of the event's ability to diffuse.
Further preferably, the feature detection algorithm is performed in a distributed detection mode.
Further, the feature detection is performed in a manner that each type of event diffusion Map is applied to one slice to execute a plurality of reduce tasks in parallel.
Further, the diffusion model construction also comprises the steps of dividing the large social network graph into sub-graphs and then distributing each sub-graph to the process nodes; in each subgraph, there are two types of nodes: interior nodes and edge nodes; the internal node is a node with all neighbors in the subgraph; edge nodes have neighbors in other subgraphs. For each sub-graph G, all internal nodes and edges between them constitute a closed graph G; the edge node may be considered "supporting information" for updating the rule.
Further, the features of the case include: forwarding amount, comment amount, user node degree and activity degree.
Further, the case information diffusion is based on the social network case information data diffusion detection of the association rule of the characteristic information.
Further, the diffusion process of the diffusion event with the same characteristic is compared by using the Average Path Length (APL) in the graph theory, and the nodes and the edges in the event diffusion graph are stored by using an adjacency list.
Preferably, a case popularity diffusion processing system based on social network information is further provided, and the system includes a processor and a memory, the memory stores a computer program thereon, and the processor is used for executing the computer program on the memory to implement the method.
The method disclosed by the invention is used for carrying out diffusion processing on the case information data of the social network through the association rule. The method comprises the steps of crawling social network case information data, collecting data forwarded and commented in the diffusion process, completing diffusion processing of the social network case information data on the basis, and improving monitoring of diffusion information in the hot case information Internet.
Drawings
The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and are not to be construed as limiting the disclosure in any way.
FIG. 1 is a schematic view of the event diffusion graph topology of the present method.
FIG. 2 is a schematic diagram of a data abstraction model and data structure of the method.
FIG. 3 is a schematic diagram of case information forwarding and review in the method.
Fig. 4 is a schematic input and output diagram of the detection method of the present method.
Detailed Description
These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will be better understood by reference to the following description and drawings, which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale. Various block diagrams are used in this disclosure to illustrate various variations of embodiments according to the disclosure.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that "/" in this context means "or", for example, A/B may mean A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.
It should be noted that, for convenience of clearly describing the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same items or similar items with substantially the same function or action, and those skilled in the art may understand that words such as "first" and "second" do not limit the quantity and execution order. For example, the first information and the second information are for distinguishing different information, not for describing a specific order of information.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
Example 1
By using
Figure 35595DEST_PATH_IMAGE024
A set of keywords in internet legal text is described, where the elements are called terms.
Figure 10505DEST_PATH_IMAGE025
For describing sets of related data, called databases (databases), in which transactions (transactions)
Figure 679252DEST_PATH_IMAGE026
Is a collection of items, i.e. a transaction is
Figure 30599DEST_PATH_IMAGE027
Is selected from a group consisting of (a) a subset of,
Figure 851925DEST_PATH_IMAGE028
then each transaction is considered to have only one identity, e.g. the transaction number is described by TID.
Figure 263314DEST_PATH_IMAGE029
Wherein
Figure 419358DEST_PATH_IMAGE030
And
Figure 308817DEST_PATH_IMAGE031
for describing predicates or data items, the meaning of the above rule is that if the transactions are the same, if
Figure 250228DEST_PATH_IMAGE032
Occur, then
Figure 832519DEST_PATH_IMAGE033
Can also occur. Assuming A represents the item set and transaction T contains A, the association rule is
Figure 23329DEST_PATH_IMAGE034
In which
Figure 965746DEST_PATH_IMAGE035
At the same time
Figure 496084DEST_PATH_IMAGE036
Rules
Figure 249277DEST_PATH_IMAGE037
The support in the transaction data set D is defined as the ratio of the number of transactions A and B in the transaction set to the total number of transactions, and is used
Figure 130645DEST_PATH_IMAGE038
Is described, i.e. is
Figure 876753DEST_PATH_IMAGE039
:
Figure 261598DEST_PATH_IMAGE038
=
Figure 920112DEST_PATH_IMAGE039
=
Figure 288777DEST_PATH_IMAGE040
(1)
Rules
Figure 572997DEST_PATH_IMAGE041
The confidence in the transaction set is the ratio of the number of transactions containing A to B to the number of transactions containing A, using confidence (A)
Figure 343507DEST_PATH_IMAGE042
B) Is described, i.e. is
Figure 172922DEST_PATH_IMAGE043
Figure 825620DEST_PATH_IMAGE044
Wherein the content of the first and second substances,
Figure 664263DEST_PATH_IMAGE045
the transaction record number is used for describing the transaction record number containing the item set AYB;
Figure 272968DEST_PATH_IMAGE046
for describing the number of transaction records containing item set a.
The support degree and the confidence degree of the rule are indexes for measuring the rule, the practicability and the certainty of the rule are reflected in sequence, and the threshold value ranges from 0% to 100%.
And (3) giving a transaction set D in the internet legal topics, and mining association rules, namely meeting the process of giving a minimum support degree min-sup and a minimum confidence coefficient min-con by a user.
In practical application, people usually only pay attention to the association rule meeting certain support degree and confidence degree, and the rule meeting both min-sup and min-con is called as a strong rule. The mining association rule problem is a problem of generating a strong rule for a certain transaction set D. The association rule mining process can be described as finding all of the transactions for one transaction database DPartial agreement
Figure 538864DEST_PATH_IMAGE047
Figure 616542DEST_PATH_IMAGE048
Associated rule of
Figure 258876DEST_PATH_IMAGE049
[9]。
The detailed process of forming the association rule is as follows:
(1) Traverse frequent itemset
Figure 987666DEST_PATH_IMAGE050
To obtain
Figure 424464DEST_PATH_IMAGE050
All non-empty subsets s;
(2) If it is not
Figure 723858DEST_PATH_IMAGE051
Then an association rule is generated "
Figure 169883DEST_PATH_IMAGE052
”。
A collection of items (items) is called an Item Set (Item Set), containing
Figure 487601DEST_PATH_IMAGE053
The set of items of a data item is called
Figure 95300DEST_PATH_IMAGE053
A set of items. The frequency of the item set is defined as the number of transactions in D that contain the item set. And if the frequency of the item set exceeds the product of sup and the total number of the transactions in D, the item set is considered to accord with the minimum support degree min-sup. Then the set of items is called a frequent set of items. The set of frequent k-term sets is commonly written as
Figure 147569DEST_PATH_IMAGE054
The diffusion principle of case information data is set as follows:
suppose that
Figure 866127DEST_PATH_IMAGE055
For describing case information data samples, V is the fundamental domain, vj is the observed value of wj, then:
Figure 569509DEST_PATH_IMAGE056
(4)
presence function
Figure 348109DEST_PATH_IMAGE057
Make the information obtained through vj depend on
Figure 887675DEST_PATH_IMAGE057
And if the data is diffused into v, the original case information data obtained by diffusion is distributed as follows:
Figure 206661DEST_PATH_IMAGE058
the above formula can better reflect the overall law of w.
Most information related to law in the internet media is influenced by user behaviors, a divergence trend is generated in the diffusion process, and the scheme performs diffusion processing on the social network case information data through the described association rules. The method comprises the steps of crawling social network case information data, collecting data forwarded and commented in the diffusion process, and finishing diffusion processing of the social network case information data on the basis.
According to the concept of event diffusion, certain differences and connections are made with information diffusion. Information diffusion mainly refers to physical propagation of information, and nodes should be cut off to inhibit diffusion propagation; when information diffusion is researched in a multi-information social network, the diffusion of mainstream information is emphasized. The event is proposed for modeling information diffusion, and is used for packaging and abstracting information in the original information diffusion.
The method adopts Average Path Length (APL) in graph theory to compare diffusion processes of diffusion events with the same characteristics, adopts an adjacency list to store nodes and edges in an event diffusion graph, and has the following storage structure:
Struct DiffusionGraph{
borolean connected// continuity test
Long eventTimes// diffusion map event number
ArrayList EgNodes// total number of nodes of diffusion graph
Map < Long, set < Long > > nbr _ Map// Map storage structure
string Info// event content information
}
In the above storage structure, connected identifies a continuity diffusion detection result; eventtims is the total number of events; the EgNodes is the total number of nodes in the statistical graph and is used for quantifying the diffusion rate; nbr _ map stores the nodes and edges of the graph; the Info stores the event content in the diffusion map. The topology of event diffusion and its corresponding adjacency list structure are shown in fig. 1 and fig. 2.
The static characteristics of case information diffusion are obtained, in order to research the problem of case information data diffusion of the social network, the quantity of forwarded and commented legal case information is collected, and the collection interval is 15 min. The hot information is a news hotspot of a certain case issued by a certain billow microblog user, the forwarding and commenting conditions of the hot information are shown in fig. 3, and it can be seen that the time for forwarding the information by downstream users is not concentrated, and the information only meets the normal distribution to a certain extent in the overall view. Meanwhile, the amount of the information commented by the user in each time period is different, and is positively correlated with the frequency of the forwarding time periods shown in the histogram of fig. 3. The line chart of fig. 3 illustrates the quantitative trend of the above information being reviewed at various time intervals. And extracting correlation between the forwarding amount and the comment amount, and calculating by using the correlation rule based on the statistical mining correlation rule to obtain a correlation coefficient between the comment amount and the forwarding amount, wherein the correlation coefficient is 0.72. The method shows that in the process of spreading internet legal topics, the user forwarding behaviors and the comment behaviors have obvious relevance.
On one hand, in case information of the network media, a user can comment and forward one piece of case information while forwarding and commenting the information, so that the forwarding number and the comment number are increased. On the other hand, since the information value of the case itself is high, social attention is high. When the user selects the comment and forwards, the information content conforms to the user interest. Therefore, the information forwarding amount is positively correlated with the comment amount. The method comprises the steps of obtaining the correlation degree of user node degree and activity degree, wherein the node degree is the embodiment of the connection degree between a case message network node and an adjacent node and comprises two concepts of in-degree and out-degree, the in-degree is the number of concerned users, and the out-degree is the number of concerned users. With the increasing of the degree of income, the influence of the users is larger, and the published information can be browsed by more users. As the degree of departure gradually increases, the user may browse more information. The node degrees are measured through the association rules, and the greater the association degree between the association rules is, the higher the node degree is.
Table 1 shows the results of the user node degree and the entrance and exit degree counted by the association rule method according to the above samples.
TABLE 1 sample node degree, in-out degree
Figure 515283DEST_PATH_IMAGE059
It can be seen that the node degree of the node a in the figure is 677, which indicates that there are 677 contacts with other nodes between the node a and the adjacent node. The in-degree value of the node a is 168, and the out-degree value is 509, that is, the user has 168 fans, and pays attention to 509, which indicates that the activity of the user in the social network is high. For the node W, the degree of entry is 6, and the degree of exit is also 6, which indicates that the node W is not active in spreading the case information, and such a user is generally considered to be a diving user.
The extraction and diffusion breadth is related to the influence of the user, in the process of researching the diffusion of the social network case information data through the association rule, the influence of the information publisher can also have great influence on the information diffusion, and the influence of the user is mainly evaluated through the user vermicelli quantity. If a user has more friends, the published information can be browsed, concerned and forwarded by more people, and the diffusion of information data is facilitated. The following is a list of statistics of the size of the hot information forwarded by different users, and the results are shown in table 2.
TABLE 2 distribution of different users to the extent of diffusion of case information data
Figure 714052DEST_PATH_IMAGE060
As can be seen from table 2, the greater the user influence, the greater the number of times of forwarding the case information data, and the number of comments and the number of times of approval increase accordingly. This is because the more friends the user has, the more cases information the user has published will be browsed by more people, thereby increasing the forwarding amount and the comment amount. However, this is also because the influence of the user is high, and the number of friends is large, so that the social network case information is diffused and related to the influence.
And (5) formalizing the model. T-DZD models the diffusion of information through a directed network G = (U, E), where U is the set of all nodes and E (⊂ U × U) is the set of all arcs. For each arc
Figure 475334DEST_PATH_IMAGE061
There are two parameters:
Figure 801273DEST_PATH_IMAGE062
give a
Figure 964401DEST_PATH_IMAGE063
At the time of day
Figure 68493DEST_PATH_IMAGE064
To transmit information to
Figure 582650DEST_PATH_IMAGE065
Of which 0<
Figure 446701DEST_PATH_IMAGE062
<1, and
Figure 464336DEST_PATH_IMAGE066
wherein
Figure 270487DEST_PATH_IMAGE066
>0。
Figure 6362DEST_PATH_IMAGE062
Referred to as the diffusion function,
Figure 470841DEST_PATH_IMAGE066
referred to as a time delay parameter.
Figure 342982DEST_PATH_IMAGE062
Is a function of the node, edge and content characteristics of the exchange. As for the Independent Cascades (IC) model, the diffusion process starts from a given set of initial activation nodes S, but disadvantageously they spread out in a continuous time. Each node active at time t
Figure 70767DEST_PATH_IMAGE063
All have a chance to take a probability
Figure 566643DEST_PATH_IMAGE062
Activate each inactive neighbor thereof
Figure 772496DEST_PATH_IMAGE065
. If the activation is successful, the remote node is at time
Figure 764723DEST_PATH_IMAGE067
Becomes active. The stop condition for the process is that no further activation can take place.
The input and output of the T-DZD are shown in fig. 4. A feature space. The model computing node
Figure 397829DEST_PATH_IMAGE063
At the time of day
Figure 91985DEST_PATH_IMAGE064
To the node
Figure 101529DEST_PATH_IMAGE065
Sending a message
Figure 213842DEST_PATH_IMAGE068
The probability of (c). This probability is a function of the node, edge and topic features belonging to social, topic and temporal dimensions. Alternatively, the 3 interpretable features described below are values between 0 and 1 calculated from past information diffusion traces.
Social dimension characteristics: rate at which each node issues messages
Figure 17850DEST_PATH_IMAGE069
(ii) a Two groups of nodes
Figure 464880DEST_PATH_IMAGE070
And
Figure 809274DEST_PATH_IMAGE071
and H: (
Figure 776093DEST_PATH_IMAGE072
) An interactive Jaccard similarity coefficient; ratio of directed to undirected messages issued by each node
Figure 751002DEST_PATH_IMAGE073
(ii) a Rate mR for each node to receive the target message (m:)
Figure 436061DEST_PATH_IMAGE070
),mR(
Figure 36676DEST_PATH_IMAGE071
);
Subject dimension characteristics: interest of each user in information
Figure 858001DEST_PATH_IMAGE074
Figure 3812DEST_PATH_IMAGE075
Time dimension characteristics:distribution of activities per user during a day, as a non-parametric function of vector storage
Figure 910588DEST_PATH_IMAGE076
And estimating model parameters. Probability of diffusion
Figure 49314DEST_PATH_IMAGE077
Given by the following equation, where V is the correlation vector of the feature:
Figure 990725DEST_PATH_IMAGE078
estimating data describing the propagation mode of the past information in the network by using Bayesian Logistic regression to obtain
Figure 307437DEST_PATH_IMAGE079
And (4) the coefficient.
Illustratively, the TAP is designed with an efficient distributed learning algorithm which is implemented and tested under a Map-Reduce framework, and adopts an event characteristic detection algorithm in order to extend to a practical large-scale network.
And performing characteristic analysis according to the diffusion graph of the diffusion events. The input parameters of the algorithm are diffusion graph and characteristic coefficient of diffusion event
Figure 498247DEST_PATH_IMAGE080
(ii) a The output of the algorithm is the diffusion signature of the event and the APL value. The main idea of the feature detection algorithm is as follows:
1) Setting algorithm parameters, i.e. characteristic coefficients
Figure 191397DEST_PATH_IMAGE081
2) And counting the degree of each node in the diffusion graph according to the adjacency list structure in the diffusion graph.
3) Counting the number of multi-branch nodes and two-branch nodes in the graph, wherein the nodes with the node degree larger than 2 are the multi-branch nodes, and the nodes are the two-branch nodes on the contrary.
4) And calculating the ratio of the star nodes, and classifying the diffusion event characteristics by comparing the characteristic coefficients.
5) The APL values of each connected branch of the diffusion map are calculated.
6) Calculate the APL value of the whole diffusion map, i.e. the value of the event's ability to diffuse.
The algorithm is executed by using the calculation of APL value and the time complexity is
Figure 236582DEST_PATH_IMAGE082
Wherein m is the number of connected branches of the graph, and n is the number of nodes in the connected branch containing the largest number of nodes.
And the event Diffusion Detection algorithm adopts a Distributed Diffusion Detection (DDD) algorithm to complete the Detection of the event Diffusion process, and is based on a programming model of MapReduce. And ensuring that each type of event diffusion Map is divided into one fragment so as to execute a plurality of reduce tasks in parallel. The specific flow of the execution logic of the DDD algorithm as embodied in fig. 4 is as follows.
Since a social network may contain millions of users, and hundreds of millions of social ties between users, it is impractical to use a single machine to learn TFGs from such voluminous data. To address this challenge, we deploy learning tasks on distributed systems under the map-reduce programming model.
Map-Reduce is a programming model for distributed processing of large data sets. In the Map phase, each machine (referred to as a process node) receives a subset of the data as input and generates a set of intermediate key/value pairs. In the Reduce phase, each process node merges all intermediate values associated with the same intermediate key and outputs the final calculation result. The user specifies a mapping function that processes the key/value pairs to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
In the affinity propagation process, we first divide the large social network graph into subgraphs and then assign each subgraph to a flow node. In each subgraph, there are two types of nodes: interior nodes and edge nodes. An internal node is a node where all neighbors are in the subgraph. Edge nodes have neighbors in other subgraphs. For each sub-graph G, all internal nodes and edges between them constitute a closed graph G. The edge node may be considered "supporting information" for updating the rule. For ease of illustration, we consider a distributed learning algorithm for a single topic, so the mapping phase and the reduction phase can be defined as follows.
In the Map phase, each process node scans the closed graph G of the assigned sub-graph G. Note that each edge eij has two values aij and rij. Thus, the mapping function is defined to issue one intermediate key/value pair ei ∗/(bij + aij) for each key/value pair eij/aij; for the key/value pair eij/rij, then an intermediate key/value pair e ∗ j/rij is issued.
During the reduction phase, each process node collects all the values associated with the intermediate keys ei to generate new ri according to the equation, and all the intermediate values associated with the same key e j generate new a j according to the equation. Thus, one mapping reduction process corresponds to one iteration in our affinity propagation algorithm.
And (3) operating an event diffusion detection algorithm on the Hadoop cluster by using single file data sets with different scales. On each data set, the algorithm was executed 10 times, taking the average of the optimal 3 times as the final execution time, as shown in table 3:
TABLE 3 DDD Algorithm runtime comparison
Figure 989774DEST_PATH_IMAGE083
According to the test results in the graph, the execution time of the algorithm is obviously improved along with the increase of the data scale, and when the data set reaches 1GB, the execution time of the algorithm is less than 300s. Experiments show that the method has obvious advantage in execution time when processing the social network data information.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. A case popularity diffusion processing method based on social network case information is characterized in that:
constructing a diffusion monitoring model, wherein the social network case information is modeled by diffusion of a directed network G = (U, E), wherein U is a set of all nodes, and E (⊂ U × U) is a set of all arcs; for each arc
Figure 204481DEST_PATH_IMAGE001
There are two parameters:
Figure 17717DEST_PATH_IMAGE002
give a
Figure 718956DEST_PATH_IMAGE003
At the time of day
Figure 959445DEST_PATH_IMAGE004
To transmit information to
Figure 628192DEST_PATH_IMAGE005
Wherein 0 is<
Figure 979539DEST_PATH_IMAGE002
<1, and
Figure 535286DEST_PATH_IMAGE006
wherein
Figure 212255DEST_PATH_IMAGE006
>0;
Figure 102719DEST_PATH_IMAGE002
Referred to as the diffusion function,
Figure 257757DEST_PATH_IMAGE006
referred to as time delay parameters;
Figure 933589DEST_PATH_IMAGE002
is a function of the node, edge and content characteristics of the exchange;
computing node
Figure 515880DEST_PATH_IMAGE007
At the time of day
Figure 706690DEST_PATH_IMAGE008
To the node
Figure 852369DEST_PATH_IMAGE009
Sending a message
Figure 445025DEST_PATH_IMAGE010
The probability of (d);
probability is a function of the node, edge and topic features belonging to social, topic and temporal dimensions, where the social dimension features: rate at which each node issues messages
Figure 932638DEST_PATH_IMAGE011
,
Figure 814006DEST_PATH_IMAGE012
(ii) a Two groups of nodes
Figure 560114DEST_PATH_IMAGE007
And
Figure 210538DEST_PATH_IMAGE009
and H: (
Figure 134632DEST_PATH_IMAGE013
) A Jaccard similarity coefficient of interaction; ratio of directed to undirected messages issued by each node
Figure 237717DEST_PATH_IMAGE014
,
Figure 521937DEST_PATH_IMAGE015
(ii) a Rate mR for each node to receive the target message (m:)
Figure 292447DEST_PATH_IMAGE007
),mR(
Figure 121863DEST_PATH_IMAGE009
);
Subject dimension characteristics: interest of each user in information
Figure 977823DEST_PATH_IMAGE016
Figure 800154DEST_PATH_IMAGE017
Time dimension characteristics: distribution of activities per user during a day, as a non-parametric function of vector storage
Figure 425171DEST_PATH_IMAGE018
,
Figure 691067DEST_PATH_IMAGE019
Probability of diffusion
Figure 768745DEST_PATH_IMAGE020
Given by the following equation, where V is the correlation vector of the feature:
Figure 394767DEST_PATH_IMAGE021
estimating data describing the propagation mode of the past information in the network by using Bayesian Logistic regression to obtain
Figure 874290DEST_PATH_IMAGE022
And (4) the coefficient.
2. The method of claim 1, wherein: the method further comprises performing feature detection based on a diffusion map of diffusion events; the input parameters of the feature detection are a diffusion graph and feature coefficients of diffusion events
Figure 311087DEST_PATH_IMAGE023
(ii) a The output is the diffusion signature and APL value of the event.
3. The method of claim 2, wherein: the feature detection specifically includes:
1) Setting characteristic coefficients
Figure 672798DEST_PATH_IMAGE023
2) Counting the degree of each node in the diffusion graph according to the adjacency list structure in the diffusion graph;
3) Counting the number of multi-branch nodes and two-branch nodes in the graph, wherein the multi-branch nodes are the node degrees more than 2, and otherwise, the two-branch nodes are the node degrees;
4) Calculating the ratio of the star nodes, and classifying the diffusion event characteristics by comparing characteristic coefficients;
5) Calculating the APL value of each connected branch of the diffusion diagram;
6) Calculate the APL value of the whole diffusion map, i.e. the value of the event's ability to diffuse.
4. The method of claim 3, wherein: the feature detection is completed in a distributed detection mode.
5. The method of claim 4, wherein: the feature detection is performed in a manner that a plurality of reduce tasks are executed in parallel by adopting each type of event diffusion Map to one fragment.
6. The method of claim 5, wherein: the construction of the diffusion monitoring model further comprises the steps of dividing the large social network graph into subgraphs, and then distributing each subgraph to the process nodes; in each subgraph, there are two types of nodes: interior nodes and edge nodes; the internal node is a node with all neighbors in the subgraph; the edge node has neighbors in other subgraphs; for each sub-graph G, all internal nodes and edges between them constitute a closed graph G.
7. The method of claim 6, wherein: the social networking case information data includes: forwarding amount, comment amount, user node degree and activity degree.
8. The method of claim 6, wherein: the diffusion of case information is based on the detection of the social network case information data diffusion based on the association rule of the characteristic information.
9. The method of claim 8, wherein: and comparing the diffusion processes of the diffusion events with the same characteristic by adopting the average path length APL in the graph theory, and storing the nodes and the edges in the event diffusion graph by adopting an adjacent table.
10. The method of any one of claims 1-9, wherein the method is applied to information processing of a fair litigation case.
CN202211107987.8A 2022-09-13 2022-09-13 Case popularity diffusion processing method based on social network information Pending CN115185715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211107987.8A CN115185715A (en) 2022-09-13 2022-09-13 Case popularity diffusion processing method based on social network information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211107987.8A CN115185715A (en) 2022-09-13 2022-09-13 Case popularity diffusion processing method based on social network information

Publications (1)

Publication Number Publication Date
CN115185715A true CN115185715A (en) 2022-10-14

Family

ID=83524360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211107987.8A Pending CN115185715A (en) 2022-09-13 2022-09-13 Case popularity diffusion processing method based on social network information

Country Status (1)

Country Link
CN (1) CN115185715A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2924446A1 (en) * 2014-09-16 2015-03-26 Sysomos L.P. System and method for analyzing and transmitting social communication data
CN105608624A (en) * 2015-12-29 2016-05-25 武汉理工大学 Microblog big data interest community analysis optimization method based on user experience
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
US20170206460A1 (en) * 2014-09-05 2017-07-20 Icahn School Of Medicine At Mount Sinai Systems and Methods for Causal Inference in Network Structures Using Belief Propagation
CN107273396A (en) * 2017-03-06 2017-10-20 扬州大学 A kind of social network information propagates the system of selection of detection node
CN110705276A (en) * 2019-09-26 2020-01-17 中电万维信息技术有限责任公司 Method, device and storage medium for monitoring network public sentiment based on neural network
CN111400927A (en) * 2020-03-31 2020-07-10 中国石油大学(北京) Method and device for predicting corrosion growth in pipeline based on generalized additive model
CN111738514A (en) * 2020-06-23 2020-10-02 重庆理工大学 Social network community discovery method using local distance and node rank optimization function

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170206460A1 (en) * 2014-09-05 2017-07-20 Icahn School Of Medicine At Mount Sinai Systems and Methods for Causal Inference in Network Structures Using Belief Propagation
CA2924446A1 (en) * 2014-09-16 2015-03-26 Sysomos L.P. System and method for analyzing and transmitting social communication data
CN105608624A (en) * 2015-12-29 2016-05-25 武汉理工大学 Microblog big data interest community analysis optimization method based on user experience
CN106202211A (en) * 2016-06-27 2016-12-07 四川大学 A kind of integrated microblogging rumour recognition methods based on microblogging type
CN107273396A (en) * 2017-03-06 2017-10-20 扬州大学 A kind of social network information propagates the system of selection of detection node
CN110705276A (en) * 2019-09-26 2020-01-17 中电万维信息技术有限责任公司 Method, device and storage medium for monitoring network public sentiment based on neural network
CN111400927A (en) * 2020-03-31 2020-07-10 中国石油大学(北京) Method and device for predicting corrosion growth in pipeline based on generalized additive model
CN111738514A (en) * 2020-06-23 2020-10-02 重庆理工大学 Social network community discovery method using local distance and node rank optimization function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
全拥: "面向社交网络的影响力分析关键技术研究", 《中国博士学位论文全文数据库 基础科学辑》 *

Similar Documents

Publication Publication Date Title
Gan et al. A survey of utility-oriented pattern mining
Li et al. Bimodal distribution and co-bursting in review spam detection
Mathioudakis et al. Sparsification of influence networks
Schouten et al. Supervised and unsupervised aspect category detection for sentiment analysis with co-occurrence data
Bourigault et al. Representation learning for information diffusion through social networks: an embedded cascade model
Chen et al. Entity embedding-based anomaly detection for heterogeneous categorical events
Zhang et al. Event detection and popularity prediction in microblogging
Ruiz et al. Correlating financial time series with micro-blogging activity
Nettleton Data mining of social networks represented as graphs
Cao et al. Whom to ask? jury selection for decision making tasks on micro-blog services
Ediger et al. Massive social network analysis: Mining twitter for social good
Bordin et al. DSPBench: A suite of benchmark applications for distributed data stream processing systems
Mendoza et al. Bots in social and interaction networks: detection and impact estimation
CN106355506B (en) Influence maximization initial node selection method in online social network
Sela et al. Active viral marketing: Incorporating continuous active seeding efforts into the diffusion model
Qu et al. Efficient online summarization of large-scale dynamic networks
Andrade et al. GPU-NB: a fast CUDA-based implementation of naive bayes
Cheng et al. Efficient top-k vulnerable nodes detection in uncertain graphs
Fdez-Glez et al. A dynamic model for integrating simple web spam classification techniques
Parau et al. Opinion leader detection
Hu et al. Predicting key events in the popularity evolution of online information
Yang et al. Influence analysis in evolving networks: A survey
Deng et al. An efficient policy evaluation engine for XACML policy management
CN115185715A (en) Case popularity diffusion processing method based on social network information
Bingöl et al. Topic-based influence computation in social networks under resource constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221014