CN115098795A - Data analysis method, data analysis device, equipment and storage medium - Google Patents

Data analysis method, data analysis device, equipment and storage medium Download PDF

Info

Publication number
CN115098795A
CN115098795A CN202210680854.3A CN202210680854A CN115098795A CN 115098795 A CN115098795 A CN 115098795A CN 202210680854 A CN202210680854 A CN 202210680854A CN 115098795 A CN115098795 A CN 115098795A
Authority
CN
China
Prior art keywords
hypergraph
social network
analysis
vertex
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210680854.3A
Other languages
Chinese (zh)
Inventor
胡秋月
秦培武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua-Berkeley Shenzhen Institute
Original Assignee
Tsinghua-Berkeley Shenzhen Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua-Berkeley Shenzhen Institute filed Critical Tsinghua-Berkeley Shenzhen Institute
Priority to CN202210680854.3A priority Critical patent/CN115098795A/en
Publication of CN115098795A publication Critical patent/CN115098795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The embodiment of the application provides a data analysis method, a data analysis device, equipment and a storage medium, and belongs to the technical field of data analysis. The method comprises the following steps: acquiring original data of a social network platform, wherein the original data comprises original user information and original project information; filtering the original data to obtain target data; wherein the target data comprises: target user information and target project information; constructing a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network; and analyzing the attributes of the hypergraph social network to obtain an analysis result. According to the method and the device, the hypergraph network is constructed through the target user information and the target project information, and different attributes of the hypergraph social network are analyzed, so that the cooperation degree between users and the cooperation degree between the users and the cooperation degree between projects of the social network platform are researched, and the accuracy of the analysis result can be improved.

Description

Data analysis method, data analysis device, equipment and storage medium
Technical Field
The present invention relates to the field of data analysis technologies, and in particular, to a data analysis method, a data analysis apparatus, a device, and a storage medium.
Background
There are many studies exploring the social network of the social network platform. For example, all user information and project information on a Github platform in a certain year or nearly 10 years are extracted, including attention relationships, repeated relations, praise relationships and the like between users, partner relationships of various items and the like. On the basis of the relations, a user graph network and/or a project graph network are/is constructed respectively, and the attributes of the graph networks are analyzed. For example, influence, activity, and cooperation degree between users are analyzed through user information and project information. However, the above analysis method is difficult to reflect the complex relationship of the social network platform, and the accuracy of the analysis result is low. Therefore, how to provide a data analysis method can improve the analysis accuracy of the social network platform, and becomes a technical problem to be solved urgently.
Disclosure of Invention
The embodiment of the disclosure mainly aims to provide a data analysis method, a data analysis device, equipment and a storage medium, which can improve the analysis accuracy of a social network platform.
In order to achieve the above object, a first aspect of the embodiments of the present disclosure provides a data analysis method, where the method includes:
acquiring original data of a social network platform, wherein the original data comprises original user information and original project information;
filtering the original data to obtain target data; wherein the target data comprises: target user information and target project information;
constructing a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network;
and carrying out attribute analysis on the hypergraph social network to obtain an analysis result.
In some embodiments, the hypergraph social network includes a vertex set, a hyperedge set, and a hyperedge weight value set, and the hypergraph social network is constructed according to the target user information and the target project information to obtain a hypergraph social network, including:
creating a vertex set according to the target user information and the target project information, wherein the vertex set comprises a plurality of object vertexes;
determining object excess edges according to the user connection relation of the target user information and the user project connection relation between the target user information and the target project information; wherein each said object hyper-edge comprises a plurality of said object vertices;
determining an object weight value of the object excess edge according to the relation type of the user connection relation and the user item connection relation;
and obtaining a super edge set according to the object super edge, and obtaining a super edge weight value set according to the object weight value.
In some embodiments, the analysis results include at least one of: correspondingly, the attribute analysis is carried out on the hypergraph social network to obtain an analysis result, and the method comprises at least one of the following steps:
performing click-degree centrality analysis on the hypergraph social network to obtain the liveness analysis result;
analyzing paths and distances of the hypergraph social network to obtain an interconnection degree analysis result;
and performing approximate centrality analysis or intermediary centrality analysis on the hypergraph social network to obtain a vertex importance analysis result.
In some embodiments, the performing a click-through centrality analysis on the hypergraph social network to obtain the liveness analysis result includes:
determining an incidence matrix according to the affiliated relationship between the object vertex and the object excess edge;
calculating the correlation number of the object excess edges correlated with each object vertex according to the correlation matrix;
and obtaining the activity analysis result according to the correlation quantity.
In some embodiments, the analyzing the path and the distance of the hypergraph social network to obtain the interconnection degree analysis result includes:
calculating the shortest path of every two object vertexes to obtain a plurality of shortest path lengths;
calculating the longest path of every two object vertexes to obtain a plurality of longest path lengths;
calculating an average shortest path length according to a plurality of shortest path lengths;
calculating an average longest path length according to the plurality of longest path lengths;
and obtaining the interconnection degree analysis result according to the average shortest path length and the average longest path length.
In some embodiments, the performing a near-centrality analysis or an intermediary-centrality analysis on the hypergraph social network to obtain a vertex importance analysis result includes:
if the hypergraph social network is subjected to approximate centrality analysis, for each object vertex, calculating the shortest path between the object vertex and the target object vertex to obtain a plurality of shortest path lengths;
and obtaining the analysis result of the vertex importance degree according to the sum of the lengths of the shortest paths.
In some embodiments, the performing a near-centrality analysis or an intermediary-centrality analysis on the hypergraph social network to obtain a vertex importance analysis result includes:
and if the hypergraph social network is subjected to intermediary centrality analysis, traversing all the object vertexes, determining the shortest path of every two object vertexes, wherein the shortest path comprises at least one intermediate object vertex, and obtaining the vertex importance analysis result according to the number of the intermediate object vertexes.
A second aspect of an embodiment of the present disclosure provides a data analysis apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring original data of a social network platform, and the original data comprises original user information and original project information;
the filtering module is used for filtering the original data to obtain target data; wherein the target data comprises: target user information and target project information;
the hypergraph social network creating module is used for constructing a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network;
and the attribute analysis module is used for carrying out attribute analysis on the hypergraph social network so as to obtain an analysis result.
To achieve the above object, a third aspect of the embodiments of the present application provides a computer device, which includes a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the program, when executed by the processor, implements the steps of the method according to the first aspect.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium for computer-readable storage, and stores one or more programs, which are executable by one or more processors to implement the steps of the method of the first aspect.
According to the data analysis method, the data analysis device, the data analysis equipment and the storage medium, the hypergraph network is constructed according to the target user information and the target project information, and different attributes of the hypergraph social network are analyzed, so that the cooperation degree between users and between projects of the social network platform is explored, and the accuracy of analysis results can be improved.
Drawings
FIG. 1 is a flow chart of a data analysis method provided by an embodiment of the present application;
fig. 2 is a flowchart of step S103 in fig. 1;
FIG. 3 is another flow chart of a data analysis method provided by an embodiment of the present application;
FIG. 4 is another flow chart of a data analysis method provided by an embodiment of the present application;
fig. 5 is a block diagram of a module structure of a data analysis apparatus according to an embodiment of the present application;
fig. 6 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the flowchart.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
There are many studies currently exploring social networks of social networking platforms. For example, all user information and project information on a Github platform in a certain year or nearly 10 years are extracted, including attention relationships, repeated relations, praise relationships and the like between users, partner relationships of various items and the like. On the basis of the relations, a user graph network and/or a project graph network are/is constructed respectively, and the attributes of the graph networks are analyzed. For example, influence, activity, and cooperation degree between users are analyzed through user information and project information. However, the above analysis method is difficult to reflect the complex relationship of the social network platform, and the accuracy of the analysis result is low. Therefore, how to provide a data analysis method which can improve the analysis accuracy of the social network platform becomes a technical problem to be solved urgently.
Based on this, the embodiments of the present application mainly aim to provide a data analysis method, a data analysis apparatus, a device, and a storage medium, and aim to improve the analysis accuracy of a social network platform by constructing a hypergraph social network and performing attribute analysis on the hypergraph social network.
The embodiment of the application has at least the following two advantages: the first advantage is that the ordinary graph network can only extract one-to-one relations among data, and the multi-object vertexes and multi-object supercages of the hypergraph social network can capture high-dimensional relations among data, so that the interaction among real users and user items on a Github platform can be mapped better; the second advantage is that the method not only captures the attention relationship, the member relationship and the like between the users, but also introduces a plurality of relationships between the users and the projects, including the problem relationship, the combination request relationship and the like, and does not only record the relationships between the users, so that the interaction relationship between the users and the projects can be more comprehensively analyzed.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The embodiment of the application provides a data analysis method, and relates to the technical field of artificial intelligence. The data analysis method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, or the like; the server side can be configured into an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be configured into a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and big data and artificial intelligence platforms; the software may be an application or the like that implements a data analysis method, but is not limited to the above form.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Embodiments of the present application provide a data analysis method, a data analysis apparatus, a device, and a storage medium, which are specifically described with reference to the following embodiments, and first describe a data analysis method in an embodiment of the present application.
Fig. 1 is an alternative flowchart of a data analysis method provided in an embodiment of the present application, and the method in fig. 1 may include, but is not limited to, steps S101 to S104.
Step S101, acquiring original data of a social network platform, wherein the original data comprises original user information and original project information;
step S102, filtering the original data to obtain target data; wherein the target data includes: target user information and target project information;
step S103, carrying out hypergraph network construction according to the target user information and the target project information to obtain a hypergraph social network;
and step S104, performing attribute analysis on the hypergraph social network to obtain an analysis result.
In steps S101 to S104 illustrated in the embodiment of the present application, a hypergraph network is constructed according to target user information and target project information, and different attributes of a hypergraph social network are analyzed, so as to explore the degree of collaboration between users and projects of a social network platform, and the accuracy of an analysis result can be improved.
In step S101 of some embodiments, the original data includes original user information and original project information, the original user information includes a user 'S attendee, a user' S project, likes, and the like, and the original project information includes a project contributor, a main implementation language of the project, a merge request of the project, comment information of the project, and the like. The raw data of the social network can be obtained in the following way, for example, by using a Github platform, historical activity data can be downloaded from an offline mirror database ghtorent of Github, so that the historical activity data can be used as the raw data. It should be noted that the mirror database records a common event timeline on the githu platform, and for each event that occurs on the githu platform, it records the time and content. Therefore, the original data of the warehouse, the user, the project, the praise and the like in the MySQL file of the database can be crawled from the off-line mirror database, and a data base is provided for the construction of a subsequent social network.
In step S102 of some embodiments, the raw data is filtered to obtain target data; wherein the target data includes: target user information and target item information. Specifically, the collected raw data of the Github platform is filtered, for example, a large number of users with 0 number of attendees and 0 number of items and users and items which have been deleted in the Github platform are deleted, so as to better focus on interaction and collaborative analysis among users.
In step S103 of some embodiments, a hypergraph social network is constructed according to the target user information and the target item information, so as to obtain a hypergraph social network, where the hypergraph social network includes a vertex set, a hyperedge set, and a hyperedge weight value set.
Specifically, referring to fig. 2, in some embodiments, step S103 includes, but is not limited to, step S201 to step S204:
step S201, a vertex set is established according to target user information and target project information, and the vertex set comprises a plurality of object vertexes;
step S202, determining object excess edges according to the user connection relation of the target user information and the user project connection relation between the target user information and the target project information; wherein each object super edge comprises a plurality of object vertices;
step S203, determining the object weight value of the object excess edge according to the relation type of the user connection relation and the user item connection relation;
and step S204, obtaining a super edge set according to the object super edges, and obtaining a super edge weight value set according to the object weight values.
Specifically, the general hypergraph is denoted by G, where in (V, E, W), V is a vertex set, and E is a super edge set. Each super edge in the super edge set is assigned with a weight value W (e) which represents the importance of the connection relation represented by the super edge in the whole super graph, wherein W is the set of the weights W (e). The hypergraph social network of the embodiment of the present application is represented by SN, (V; E ═ E- i : i belongs to I }; w), where the vertex set V ═ U @, U is a user set, R is an item set, the user set includes a plurality of user vertices, and the item set includes a plurality of item vertices. The super edge set E consists of super edges connecting user vertex and connecting user and item vertex, and represents various relations between users and between users and items respectively, each super edge E i E is defined as a pair of ordinal pairs:
Figure BDA0003698300740000081
respectively the tail and the head of the super edge. Accordingly, the number of the first and second electrodes,
Figure BDA0003698300740000082
is that
Figure BDA0003698300740000083
The set of the vertices of (a) is,
Figure BDA0003698300740000084
is that
Figure BDA0003698300740000085
The set of vertices of (1).
Figure BDA0003698300740000086
And a whole set of the top points of the super edges is formed.
It should be noted that the vertex set includes a plurality of object vertices, and the object vertices include: user vertices and item vertices. The super-edge set comprises a plurality of object super-edges, and the object super-edges comprise: user over-edge and project over-edge. The super-edge weight set includes a plurality of object weight values, including: a user weight value and an item weight value. In one example, a user vertex is determined according to target user information, an item vertex is determined according to target item information, a user excess edge is determined according to a connection relation between a plurality of user vertices, an item excess edge is determined according to a connection relation between the plurality of user vertices and the plurality of item vertices, a user weight value of the user excess edge is determined according to a relation type of the user connection relation, and an item weight value of the item excess edge is determined according to a relation type of the user item connection relation.
In the process of constructing the hypergraph network, the embodiment of the application introduces the following relations: user-user relationship: order to
Figure BDA0003698300740000091
Then the super edge connecting the user vertices has the following properties:
Figure BDA0003698300740000092
if there are concern relationship, membership relationship and other relationship types between different user vertexes, these user vertexes are connected by the super edge representing the corresponding relationship, and the weight value of the super edgeAt [0, 1 ] depending on the type of relationship]For example, the membership relationship may have a weight value of 0.8 and the attention relationship may have a weight value of 0.6. User-item relationships: the hyper-edge connecting the user vertex and the item vertex has the following properties: order to
Figure BDA0003698300740000093
Then
Figure BDA0003698300740000094
There are many relations between users and projects, including relation types such as praise relation, concern relation, question relation, merge request relation, etc. In the super-edge of the user-item relationship,
Figure BDA0003698300740000095
and is
Figure BDA0003698300740000096
And different weight values are given to the relationship types, such as the relationship of praise, the relationship of concern, the relationship of question, and the relationship of merge request, which can respectively correspond to the weight values of 0.5, 0.6, 0.7, and 0.8.
In step S104 of some embodiments, the hypergraph social network is subjected to attribute analysis to obtain an analysis result, and the analysis result includes at least one of the following: liveness analysis results, interconnection degree analysis results and vertex importance analysis results. Correspondingly, the step S104 includes at least one of the following steps:
performing point-degree centrality analysis on the hypergraph social network to obtain an activity analysis result;
analyzing paths and distances of the hypergraph social network to obtain an interconnection degree analysis result;
and performing approximate centrality analysis or intermediary centrality analysis on the hypergraph social network to obtain a vertex importance analysis result.
Specifically, referring to fig. 3, in some embodiments, the steps of: carrying out click-degree centrality analysis on the hypergraph social network to obtain an activity analysis result, wherein the click-degree centrality analysis result comprises the following steps:
step S301, determining an incidence matrix according to the affiliated relationship between the object vertex and the object excess edge;
step S302, calculating the correlation number of the object excess edges correlated with each object vertex according to the correlation matrix;
and step S303, obtaining an activity analysis result according to the association quantity.
In steps S301 to S303 illustrated in the embodiment of the present application, the association matrix of SNs is represented by H:
Figure BDA0003698300740000101
the number of associations of object excess edges associated with each object vertex can be calculated by the association matrix H, e.g., object vertex v k Is used as d c (v k ) It is shown that,
Figure BDA0003698300740000102
Figure BDA0003698300740000103
the degree centrality of the object vertex depends on the number of association of the association overcurrents related to the object vertex, so that the degree centrality of the object vertex can be determined according to the number of association, and further the activity analysis result is obtained.
Specifically, referring to fig. 4, in some embodiments, the steps of: analyzing the path and the distance of the hypergraph social network to obtain an interconnection degree analysis result, wherein the method comprises the following steps:
step S401, calculating the shortest path of every two object vertexes to obtain a plurality of shortest path lengths;
step S402, calculating the longest path of every two object vertexes to obtain the lengths of a plurality of longest paths;
step S403, calculating the average shortest path length according to the lengths of the shortest paths;
step S404, calculating the average longest path length according to the plurality of longest path lengths;
step S405, obtaining an interconnection degree analysis result according to the average shortest path length and the average longest path length.
Steps S401 to S405 illustrated in the embodiment of the present application are two user vertices
Figure BDA0003698300740000104
And
Figure BDA0003698300740000105
the social path between is composed of different vertices and hyperedges:
Figure BDA0003698300740000106
(wherein the content of the first and second components,
Figure BDA0003698300740000107
). The length of the hyper path is defined as:
Figure BDA0003698300740000108
where α is a normalization factor. The social path between two user vertices may be via different types of relationships: if two user vertices are concerns or membership in the same group, the social path may directly connect the two points; if both user vertices endorse or post a problem to the same project, the social path may indirectly connect the two points. Because the traversed object vertices are different, two object vertices may be connected by different hyper-paths. The embodiment of the application defines (v) between two object vertexes k ,v j ) Is a minimum distance d min (v k ,v j ) Longest distance d max (v k ,v j ) And an average distance d avg (v k ,v j ) The shortest hyper-path, the longest hyper-path and the average hyper-path between two object vertexes are respectively. In addition, the longest distance of all object vertex pairs in a network is also referred to as the diameter of the network, i.e. the sum of the path lengths of all object vertices that can be traversed by all vertex pairs in a network is calculated. And the average shortest hyper-path is the average of the shortest hyper-paths among all object vertices. In order to estimate the diameters and the average shortest overtravel of user vertexes and project vertexes in the whole Github social network, the average longest overtravel and the average shortest overtravel of all object vertex pairs in the two networks are calculatedAnd comparing the value with other social networks (such as Facebook, sourceforce and the like) to evaluate the interconnection degree and the tightness between users and projects on the Github platform, and obtaining the interconnection degree analysis result.
It should be noted that the shortest hyper-path calculation between two object vertices includes: by using the classical Floyd algorithm in graph theory, if dist (i, j) is the distance between the shortest path from the object vertex u to the object vertex v, whether dist (i, k) + dist (k, j) < dist (i, j) is checked for each object vertex k, if yes, the path from the object vertex i to the object vertex k to the object vertex j is proved to be shorter than the path from the object vertex i to the object vertex j directly, and then dist (i, j) ═ dist (i, k) + dist (k, j) is set, so that when all object vertices k and dist (i, j) are traversed, the shortest path from the object vertex i to the object vertex j is recorded.
The calculation of the longest hyper-path between two object vertices includes: by using the classical Floyd algorithm in graph theory, if dmax (i, j) is the distance from the object vertex u to the object vertex v, for each object vertex k, whether dmax (i, k) + dmax (k, j) > dmax (i, j) is checked, if yes, the path from the object vertex i to the object vertex k to the object vertex j is proved to be longer than the path from the object vertex i to the object vertex j directly, and dmax (i, j) ═ dmax (i, k) + dmax (k, j) is set, so that when the traversal of all the object vertices k, dmax (i, j) is finished, the longest path from the object vertex i to the object vertex j is recorded.
In other embodiments, if the hypergraph social network is analyzed for the proximity centrality, for each object vertex, the shortest path between the object vertex and the target object vertex is calculated to obtain a plurality of lengths of the shortest path; and obtaining a vertex importance analysis result according to the sum of the lengths of the shortest paths. Specifically, the approximate centrality of the object vertex is defined as:
Figure BDA0003698300740000111
the closeness to centrality of an object vertex depends on the sum of the lengths of the shortest paths from the object vertex to the vertices of other objectsThe smaller the sum of the shortest path lengths of the vertexes is, the smaller the approaching centrality of the object vertex is, which indicates that the interaction degree between the object vertex and other object vertexes is higher, and the object vertex is more important in the hypergraph related network, namely, the vertex importance analysis result is obtained.
In the center proximity analysis of the object vertex, it is necessary to calculate the sum of shortest paths from the object vertex to other object vertices. By utilizing Dijkstra algorithm in graph theory, if two vertex sets S and T exist, the object vertex with the shortest path found is stored in the set S, and the rest object vertices are stored in the set T. In the initial state, the set S only contains source object vertexes, and then the object vertex with the shortest path length is continuously selected from the set T and added into the set S. And modifying the length value of the shortest path from the object vertex to the object vertex in the set T every time a new object vertex is added in the set S. This process is repeated until all vertices of set T are added to set S.
In other embodiments, if the hypergraph social network is subjected to intermediary centrality analysis, all object vertices are traversed, the shortest path between every two object vertices is determined, the shortest path includes at least one intermediate object vertex, and a vertex importance analysis result is obtained according to the number of the intermediate object vertices. Specifically, the mesocentricity for an object vertex is defined as:
Figure BDA0003698300740000121
wherein the content of the first and second substances,
Figure BDA0003698300740000122
is connecting v j And v z And pass through v k The number of shortest hyper-paths of (a),
Figure BDA0003698300740000123
is the total number of shortest hyper-paths connecting the vertices of the two objects. The intermediary centrality measures the number of times an object vertex appears in the shortest hyper path between different object vertex pairs of the hyper graph social network. If the object vertex is the user vertex, the user vertex is close toCentrality analysis may be used to measure the importance of user vertices in part of the user's reach, i.e., "user centrality". The user centrality calculation can be used for evaluating the influence of different user vertexes in the mechanism, and a vertex importance analysis result is obtained.
Referring to fig. 5, an embodiment of the present application further provides a data analysis apparatus, which can implement the data analysis method described above, and fig. 5 is a block diagram of a module structure of the data analysis apparatus provided in the embodiment of the present application, where the apparatus includes: the system comprises an acquisition module 501, a filtering module 502, a hypergraph social network creation module 503 and an attribute analysis module 504. The obtaining module 501 is configured to obtain original data of a social network platform, where the original data includes original user information and original project information; the filtering module 502 is configured to filter the original data to obtain target data; wherein the target data includes: target user information and target project information; the hypergraph social network creating module 503 is configured to construct a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network; the attribute analysis module 504 is configured to perform attribute analysis on the hypergraph social network to obtain an analysis result.
According to the data analysis device provided by the embodiment of the application, the hypergraph network is constructed through the target user information and the target project information, and different attributes of the hypergraph social network are analyzed, so that the cooperation degree between users and between projects of the social network platform is explored, and the accuracy of the analysis result can be improved.
It should be noted that the specific implementation of the data analysis apparatus is substantially the same as the specific implementation of the data analysis method, and is not described herein again.
An embodiment of the present application further provides a computer device, where the computer device includes: the data analysis system comprises a memory, a processor, a program stored on the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program realizes the data analysis method when being executed by the processor. The computer equipment can be any intelligent terminal including a tablet computer, a vehicle-mounted computer and the like.
Referring to fig. 6, fig. 6 illustrates a hardware structure of a computer device according to another embodiment, the computer device includes:
the processor 601 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present Application;
the Memory 602 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a Random Access Memory (RAM). The memory 602 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 602 and called by the processor 601 to execute the data analysis method according to the embodiments of the present disclosure;
an input/output interface 603 for implementing information input and output;
the communication interface 604 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g., USB, internet cable, etc.) or in a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.);
a bus 605 that transfers information between the various components of the device (e.g., the processor 601, memory 602, input/output interfaces 603, and communication interfaces 604);
wherein the processor 601, the memory 602, the input/output interface 603 and the communication interface 604 are communicatively connected to each other within the device via a bus 605.
The embodiment of the present application further provides a storage medium, which is a computer-readable storage medium for computer-readable storage, and the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the data analysis method.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present disclosure are for more clearly illustrating the technical solutions of the embodiments of the present disclosure, and do not constitute a limitation to the technical solutions provided in the embodiments of the present disclosure, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present disclosure are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.
Those skilled in the art will appreciate that the solutions shown in fig. 1-4 are not meant to limit embodiments of the present disclosure, and may include more or fewer steps than those shown, or may combine certain steps, or different steps.
The above described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
According to the data analysis method, the data analysis device, the computer equipment and the storage medium, the hypergraph network is constructed through the target user information and the target project information, different attributes of the hypergraph social network are analyzed, the cooperation degree between users and between projects of the social network platform is explored, and the accuracy of analysis results can be improved.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and in actual implementation, there may be other divisions, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may also be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The preferred embodiments of the present disclosure have been described above with reference to the accompanying drawings, and therefore do not limit the scope of the claims of the embodiments of the present disclosure. Any modifications, equivalents and improvements within the scope and spirit of the embodiments of the present disclosure should be considered within the scope of the claims of the embodiments of the present disclosure by those skilled in the art.

Claims (10)

1. A method of data analysis, the method comprising:
acquiring original data of a social network platform, wherein the original data comprises original user information and original project information;
filtering the original data to obtain target data; wherein the target data comprises: target user information and target project information;
constructing a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network;
and carrying out attribute analysis on the hypergraph social network to obtain an analysis result.
2. The method of claim 1, wherein the hypergraph social network comprises a vertex set, a hyperedge set, and a hyperedge weight value set, and the hypergraph social network is constructed according to the target user information and the target item information to obtain a hypergraph social network, comprising:
creating a vertex set according to the target user information and the target project information, wherein the vertex set comprises a plurality of object vertexes;
determining object excess edges according to the user connection relation of the target user information and the user project connection relation between the target user information and the target project information; wherein each said object hyper-edge comprises a plurality of said object vertices;
determining an object weight value of the object excess edge according to the relation type of the user connection relation and the user item connection relation;
and obtaining a super edge set according to the object super edge, and obtaining a super edge weight value set according to the object weight value.
3. The method of claim 2, wherein the analysis results include at least one of: correspondingly, the attribute analysis is carried out on the hypergraph social network to obtain an analysis result, and the method comprises at least one of the following steps:
performing click-degree centrality analysis on the hypergraph social network to obtain the liveness analysis result;
analyzing paths and distances of the hypergraph social network to obtain an interconnection degree analysis result;
and performing approximate centrality analysis or intermediary centrality analysis on the hypergraph social network to obtain a vertex importance analysis result.
4. The method of claim 3, wherein performing a centrality analysis of the click-through social network to obtain the liveness analysis result comprises:
determining an incidence matrix according to the affiliated relationship between the object vertex and the object excess edge;
calculating the correlation number of the object excess edges correlated with each object vertex according to the correlation matrix;
and obtaining the activity analysis result according to the correlation quantity.
5. The method of claim 3, wherein the performing a path and distance analysis on the hypergraph social network to obtain the interconnection degree analysis result comprises:
calculating the shortest path of every two object vertexes to obtain a plurality of shortest path lengths;
calculating the longest path of every two object vertexes to obtain a plurality of longest path lengths;
calculating an average shortest path length according to a plurality of shortest path lengths;
calculating an average longest path length according to the plurality of longest path lengths;
and obtaining the interconnection degree analysis result according to the average shortest path length and the average longest path length.
6. The method of claim 3, wherein performing a near-centrality analysis or an intermediary-centrality analysis on the hypergraph social network to obtain vertex importance analysis results comprises:
if the hypergraph social network is subjected to approximate centrality analysis, for each object vertex, calculating the shortest path between the object vertex and the target object vertex to obtain a plurality of shortest path lengths;
and obtaining the analysis result of the vertex importance degree according to the sum of the lengths of the shortest paths.
7. The method of claim 3, wherein performing a near-centrality analysis or an intermediary-centrality analysis on the hypergraph social network to obtain vertex importance analysis results comprises:
and if the hypergraph social network is subjected to intermediary centrality analysis, traversing all the object vertexes, determining the shortest path of every two object vertexes, wherein the shortest path comprises at least one intermediate object vertex, and obtaining the vertex importance analysis result according to the number of the intermediate object vertexes.
8. A data analysis apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring original data of a social network platform, and the original data comprises original user information and original project information;
the filtering module is used for filtering the original data to obtain target data; wherein the target data comprises: target user information and target project information;
the hypergraph social network creating module is used for constructing a hypergraph network according to the target user information and the target project information to obtain a hypergraph social network;
and the attribute analysis module is used for carrying out attribute analysis on the hypergraph social network so as to obtain an analysis result.
9. A computer arrangement comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling connection communication between the processor and the memory, the program, when executed by the processor, implementing the steps of the method according to any one of claims 1 to 7.
10. A storage medium, being a computer readable storage medium, for computer readable storage, characterized in that the storage medium stores one or more programs executable by one or more processors to implement the steps of the method of any one of claims 1 to 7.
CN202210680854.3A 2022-06-16 2022-06-16 Data analysis method, data analysis device, equipment and storage medium Pending CN115098795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210680854.3A CN115098795A (en) 2022-06-16 2022-06-16 Data analysis method, data analysis device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210680854.3A CN115098795A (en) 2022-06-16 2022-06-16 Data analysis method, data analysis device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115098795A true CN115098795A (en) 2022-09-23

Family

ID=83290282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210680854.3A Pending CN115098795A (en) 2022-06-16 2022-06-16 Data analysis method, data analysis device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115098795A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739470A (en) * 2022-09-27 2023-09-12 荣耀终端有限公司 Network analysis method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739470A (en) * 2022-09-27 2023-09-12 荣耀终端有限公司 Network analysis method and device

Similar Documents

Publication Publication Date Title
US11645551B2 (en) Presenting inference models based on interrelationships
CN110502509B (en) Traffic big data cleaning method based on Hadoop and Spark framework and related device
US10812551B1 (en) Dynamic detection of data correlations based on realtime data
CN112016538B (en) Video processing method, video processing device, computer equipment and storage medium
CN108292257A (en) System and method for explaining client-server affairs
CN115098795A (en) Data analysis method, data analysis device, equipment and storage medium
US20150278910A1 (en) Directed Recommendations
CN114529747A (en) Policy detection method, policy detection device, electronic apparatus, and storage medium
DE102021006293A1 (en) Determination of digital personas using data-driven analytics
Madala et al. Analysis of Different Pattern Evaluation Procedures for Big Data Visualization in Data Analysis
CN110069573A (en) Product data integration method, apparatus, computer equipment and storage medium
US20140214826A1 (en) Ranking method and system
US10089703B2 (en) Realtime land use rights management with object merges
CN111027771A (en) Scenic spot passenger flow volume estimation method, system and device and storable medium
CN115423555A (en) Commodity recommendation method and device, electronic equipment and storage medium
Motohashi et al. Technological competitiveness of China's internet platformers: comparison of Google and Baidu by using patent text information
CN106611100B (en) User behavior analysis method and device
CN114692978A (en) Social media user behavior prediction method and system based on big data
CN111506826A (en) User recommendation method, device, equipment and storage medium based on intimacy
CN105279155B (en) A kind of data processing method and device accessing object
CN102737095A (en) Recommendation device, recommendation system, recommendation method, and program
CN115358379B (en) Neural network processing method, neural network processing device, information processing method, information processing device and computer equipment
CN114298855A (en) Microblog opinion leader identification method and system based on user comprehensive influence
Kulkarni Modeling, Predicting and Capturing Human Mobility
CN117874217A (en) Unstructured text data visualization method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination