WO2015058500A1 - Procédé et dispositif de mémorisation de données - Google Patents

Procédé et dispositif de mémorisation de données Download PDF

Info

Publication number
WO2015058500A1
WO2015058500A1 PCT/CN2014/075570 CN2014075570W WO2015058500A1 WO 2015058500 A1 WO2015058500 A1 WO 2015058500A1 CN 2014075570 W CN2014075570 W CN 2014075570W WO 2015058500 A1 WO2015058500 A1 WO 2015058500A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
node
edge
attribute
attribute information
Prior art date
Application number
PCT/CN2014/075570
Other languages
English (en)
Chinese (zh)
Inventor
刘志容
李川
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015058500A1 publication Critical patent/WO2015058500A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates to the field of data storage, and in particular, to a method and apparatus for storing data.
  • Information Networks are a general abstraction of massive, multidimensional, and complex structural data in real space.
  • Information networks are of great value in the fields of community network analysis, partner network analysis, traffic network capacity calculation, protein network receiving component analysis, and criminal network analysis.
  • the classic online analytical processing (OLAP, Online Analysis Processing) data warehouse model is a multidimensional data model.
  • a multidimensional data model is a multidimensional space.
  • mensions are different angles that people observe data and can be used to represent different attributes of something. For example, when analyzing product sales data, it involves time dimension, product dimension, and regional dimension. There is no unified multidimensional data model at this stage.
  • OLAP data warehouse models namely: star mode, snowflake mode, and constellation mode.
  • the star schema is the basic structure of the multidimensional data model, and its composition includes: a central fact table and a dimension table.
  • the central fact table is a core table in the star schema, storing the metrics of the facts and the key codes of the respective dimension tables; the dimension table is used to maintain the dimension information, that is, each dimension member, including the attribute information of the dimension.
  • the central fact table is connected with the key values of each dimension table stored and the dimension tables.
  • the snowflake mode is a variant of the star mode, which decomposes some dimension tables on a star schema basis.
  • the constellation mode can be regarded as a convergence of star patterns, which can satisfy multiple implementation tables to share certain dimension tables, and thus achieve multi-agent modeling.
  • the star mode can well solve its data organization. For sales data, it can be considered from four dimensions: Time, Item, Branch, and Location. This mode contains one The heart fact table (Sales), which contains four dimensions of keys (as shown in Figure 2, Time_Key, Branch key, Item key, Location key) and two metrics (as shown in Figure 2). Dollars— sold , Unit — sold ).
  • Star mode and snowflake mode are only suitable for modeling a single topic, and you cannot model multiple topics.
  • the constellation mode can satisfy multiple fact tables and share some dimension tables to realize multi-theme modeling.
  • the subject data in the information network evolves into a complex graph network, and it is necessary to simultaneously save the information dimension and the topology dimension information. Suitable for modeling of online graph processing.
  • the multidimensional data model is proposed for traditional OLAP and does not apply to the data organization in the information network. Now researchers are paying more attention to the common sales relationship between goods and goods, which involves the modeling of the connection relationship between objects and objects. At present, more and more data appears in the form of network diagrams, such as social networks, partner networks, protein networks, etc. In these networks, researchers pay more attention to the connection between entities.
  • the traditional multidimensional data model can not reasonably store and represent the network graph data relationship, and can not reasonably pay attention to the connection relationship between entities.
  • the embodiment of the invention provides a method and a device for storing data, which overcomes the problem that the traditional multidimensional data model cannot reasonably store and represent the network graph data relationship.
  • a first aspect of the embodiments of the present invention provides a method for storing data, where the method includes: acquiring an original data set;
  • the node attribute key has a corresponding relationship with the node attribute information
  • the side information includes at least: an edge identifier and an edge attribute key
  • the edge attribute key has a corresponding relationship with the edge attribute information
  • the edge is used to describe the relationship between the node and the node
  • the node information is further included Including: node metrics;
  • the side information further includes: an edge measure.
  • the extracted node information is stored in a node fact table
  • the extracted side information is stored in an edge fact table
  • the extracted node attribute information is stored in a topology dimension table
  • the extracted edge attribute information is stored in the information dimension table
  • the information in the node fact table has a corresponding relationship with the information in the edge fact table
  • the node attribute key has a corresponding relationship with the node attribute information; and the information in the topology dimension table has a corresponding relationship with the information in the node fact table;
  • the information in the information dimension table has a corresponding relationship with the information in the edge fact table.
  • the method further includes: Data is located in the stored node information, node attribute information, side information, and edge attribute information;
  • the query is performed from one of the node information, the node attribute information, the side information, or the side attribute information after the positioning.
  • the method further includes: extracting according to the extracting
  • the node information, node attribute information, side information, and edge attribute information are used for online graph processing operations.
  • the online map processing operation includes:
  • I-OLGP Information dimension roll
  • T-OLGP topological roll
  • asynchronous roll up drill down, slice, cut, and one of the data views.
  • the embodiment of the present invention in conjunction with the fifth possible implementation manner of the first aspect of the embodiment of the present invention, the embodiment of the present invention, specifically includes:
  • the information of one attribute of the edge stored in the information dimension table, or the information of one or more attributes is scrolled.
  • the topology aggregation operation specifically includes:
  • the information of one attribute of the node stored in the topology table or the information of one or more attributes is scrolled.
  • the apparatus includes: an acquiring unit, an extracting unit, and a storage unit;
  • the obtaining unit is configured to acquire an original data set
  • the extracting unit is configured to extract, from the original data set, information indicating a structure of the information network graph; wherein, the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information;
  • the node information includes at least: a node identifier and a node attribute key; the node attribute key has a corresponding relationship with the node attribute information;
  • the side information includes at least: an edge identifier and an edge attribute key; a code corresponding to the edge attribute information; the edge is used to describe a relationship between the node and the node;
  • the storage unit is configured to store the extracted node information, node attribute information, side information, and edge attribute information.
  • the node information further includes: a node metric value
  • the side information further includes: an edge measure.
  • the extracted node information is stored in a node fact table
  • the extracted side information is stored in an edge fact table
  • the extracted node attribute information is stored in a topology dimension table
  • the extracted edge attribute information is stored in the information dimension table
  • the edge is used to describe the relationship between the node and the node, the information in the node fact table Corresponding to the information in the edge fact table;
  • the node attribute key has a corresponding relationship with the node attribute information; and the information in the topology dimension table has a corresponding relationship with the information in the node fact table;
  • the information in the information dimension table has a corresponding relationship with the information in the edge fact table, because the edge attribute key has a corresponding relationship with the edge attribute information.
  • the device further includes: a positioning unit, and a query unit;
  • the positioning unit is configured to perform positioning on the stored node information, node attribute information, side information, and edge attribute information for data that needs to be queried;
  • the query unit is configured to perform query from one of the node information, the node attribute information, the side information, or the edge attribute information after the positioning.
  • the device further includes: a picture processing unit;
  • the map processing unit is configured to perform an online map processing operation according to the extracted node information, node attribute information, side information, and edge attribute information.
  • the online map processing operation in the map processing unit includes:
  • I-OLGP Information dimension roll
  • T-OLGP topological roll
  • asynchronous roll up drill down, slice, cut, and one of the data views.
  • the extracted edge attribute information is stored in the information dimension table in the graph processing unit.
  • the information dimension roll up specifically includes:
  • the information of one attribute of the edge stored in the information dimension table, or the information of one or more attributes is scrolled.
  • the topology aggregation operation specifically includes:
  • the information of one attribute of the node stored in the topology table or the information of one or more attributes is scrolled.
  • the method for storing data is provided by the embodiment of the present invention. The method extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier and a key attribute of the node attribute; the node attribute key has a corresponding relationship with the node attribute information; the side information includes at least: an edge identifier and an edge attribute key; the edge attribute key has a corresponding relationship with the edge attribute information
  • the edge is used to describe the relationship between the node and the node; the node information extracted above, the node attribute information, the side information, and the edge attribute information are stored. Since there is a connection between the extracted information, when the data is subsequently operated, the required data can be quickly and accurately located.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that the researcher can focus on the node as the center.
  • the information stored in the solution provided by the embodiment of the present invention further includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • FIG. 1 is a schematic diagram of a sales network in the prior art
  • FIG. 3 is a schematic diagram of an information network provided by an embodiment of the present invention.
  • Embodiment 4 is a method for storing data according to Embodiment 1 of the present invention.
  • FIG. 5 is a schematic diagram of association between node attribute information, side information, and edge attribute information according to an embodiment of the present invention (or a multidimensional information network data warehouse model);
  • Figure 6 is a schematic diagram of a network of research collaborators
  • FIG. 7 is a schematic diagram of a method for storing data according to Embodiment 2 of the present invention.
  • Figure 8 is a multidimensional information network data warehouse model
  • the edge fact table shown in Figure 9 is converted into an edge fact table
  • the node fact table shown in FIG. 10 is converted into a node fact relation table
  • FIG. 11 The transformation of the information dimension relationship information dimension table shown in FIG. 11;
  • Figure 12 shows the transformation of the topological dimension relationship topology table;
  • Figure 13 shows the keyword-collaborator multidimensional information network data warehouse model
  • Figure 14 shows the film actor cooperation network
  • Figure 16 is a video actor collaborative multidimensional information network data warehouse model
  • FIG. 18 is a data storage device according to Embodiment 5 of the present invention.
  • the center of interest of the user rises from a numerical measure to a graph or network
  • the structure of the center of the user's attention consists of nodes and edges.
  • the nodes and edges correspond to some related attributes, namely node attributes and edge attributes.
  • the attributes associated with the edges can be referred to as information dimensions, and the attributes associated with the nodes can be referred to as topological dimensions.
  • the edge represents the connection between two nodes.
  • the information network diagram shown in Figure 3 the circle represents the node, each side has its own attributes, and each node also has its own attributes.
  • the objects mentioned here can be understood as nodes, that is, the connection relationship between nodes and nodes.
  • Most researchers work on the connection prediction of social networks with graph structure, traffic hub node discovery, community trend evolution, protein structure analysis, etc. These tasks are carried out on graph-structured data.
  • the prior art lacks a general and efficient underlying data organization model for the storage of these data to facilitate the analysis of these data.
  • the embodiment of the present invention provides a general storage scheme for the graph data in the information network, that is, a method, a device and a system for storing data, and organizes the data structured by the graph to facilitate the research of the upper layer algorithm. It facilitates the analysis and utilization of data, solves the relationship between objects based on graphs, simplifies complex information storage formats, and eliminates redundancy; uses relational databases to store their relationships, facilitating users to perform efficient structured queries. operating.
  • the relational database refers to the creation of the Based on the database of the model, the mathematical concepts and methods such as set algebra are used to process the data in the database.
  • An embodiment of the present invention provides a method for storing data. As shown in FIG. 4, the method includes: Step 101: Acquire an original data set;
  • the original data set can be understood as a collection of all the data collected by the user, which is messy and unfavorable for analysis.
  • the raw data set obtained in step 101 may be raw data of unstructured text input into the execution device.
  • Step 102 Extract information representing a structure of the information network graph from the original data set.
  • the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information.
  • the node information includes at least: a node identifier. And a node attribute key; the node attribute key has a corresponding relationship with the node attribute information; the side information at least includes: an edge identifier and an edge attribute key; the edge attribute key has a corresponding relationship with the edge attribute information ; the edge is used to describe the relationship between the node and the node;
  • the relationship between the node and the node attribute Due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the edge is used to describe the relationship between the node and the node, and the extracted node information, node attribute information, side information, and edge can be easily represented by the graph structure.
  • the relationship between the attribute information see Figure 5, Figure 8, Figure 16 of the subsequent description).
  • the information indicating the structure of the information network graph may include: node information (eg, node identifier (VertexID)), node attribute information (eg, Attribute 1, Attribute2), side information (eg, edge identifier (EdgeID) ), edge attributes (such as Attributed Attribute2), etc.; the number of node attributes, the number of edge attributes, and the number of nodes and edges will vary according to the specific information network, and the structure will be different, as shown in Figure 3 here.
  • the information network diagram structure is only a simple example for easy understanding, and is not a limitation of the embodiment of the present invention.
  • the device extracts the original data from the original data set according to the structure of the information network graph structure, including: node information, node attribute information, side information, and representation information network structure structure of the edge attribute information. information.
  • the extracting in step 102 includes: node information, node attribute information, side information, and edge attribute information may represent information of the information network graph structure, which may be specifically a table.
  • Formal representation for example: The extracted node information is stored in the node fact table (VFT), the extracted side information is stored in the edge fact table (EFT), and the extracted node attribute information is stored in the topology dimension table (TDT), and the extracted edges are extracted.
  • the attribute information is stored in the information dimension table (IDT). Due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute makes the lists have an association (the association is shown in Figure 5 between the tables). Connection).
  • the information of the node when the information of a node is extracted, the information of the node includes: a node identifier (ie, a node ID, and the specific meaning of the node may be different according to different information network definitions, such as a multi-dimensional information network of the partner, the node may On behalf of the author, the actor collaborator in the multidimensional information network, the node can represent the actor), the attribute key of the node, and/or the metric of the node.
  • the metric of the node included in the node information may be a numerical representation of information related to the node, for example, in the partner network, the information of the node may be the number of articles published by the author. Among them, the metric of the node can be used as the preferred solution, not the solution.
  • the node attribute key has a corresponding relationship with the node attribute information. It can be understood that the node attribute key included in the node information is a link between the contact side information and the node attribute information.
  • the node attribute key code, the corresponding detailed information may be stored in the topology dimension table.
  • the node attribute key may be a movie company to which the actor belongs, and the specific information corresponding to the node (ie, actor) attribute key (ie, the movie company to which the actor belongs) is the node attribute information (ie, the node attribute).
  • the information is specific to each film company, such as: Huayi Brothers Film Production Company, Tianyu Film Company, etc.).
  • the edge information includes at least: an edge identifier and an edge attribute key, and may also include: a measure of the edge.
  • the edge identifier (EdgelD) can be represented by the identifier of two nodes.
  • two nodes of node 1 and node 2 represent the edge.
  • Each edge attribute key can represent a type of attribute. For example: If the node is the author in the partner information network, the edge represents the cooperation of the two authors, and the edge attribute key can be cooperation. Articles of cooperation between the parties, and/or the age of cooperation, and/or the location of the collaboration.
  • the metric of the edge included in the side information may be a numerical representation of the information related to the edge, such as: In the network of partners, the information of the edge may be the number of times the two authors cooperate (eg, Co- Frequence).
  • the edge attribute key has a corresponding relationship with the edge attribute information, and can be understood as the edge attribute key included in the side information is a link between the contact side information and the edge attribute information.
  • the edge attribute information may specifically be stored in the information dimension table.
  • the specific information in the side attribute information may be the name of all the articles cooperating between the collaborators, such as: “Cooperative articles include: “Rainwater” "Snowflake”. If the key of the side is the place of cooperation, the specific information in the side attribute information (specifically, the information dimension table) may be all the places where the partners cooperate, such as: Beijing, Shanghai.
  • Step 103 Store the extracted node information, node attribute information, side information, and edge attribute information.
  • the stored node information, the node attribute information, the side information, and the edge attribute information may be stored in the form of a table, that is, by: a node fact table, a topology dimension table, an edge fact table, and an information dimension table corresponding to the above information. storage.
  • the storage in the form of a table is a factual manner, and is not a limitation of the embodiment of the present invention.
  • the specific storage form may have other.
  • the method for storing data is provided by the first embodiment of the present invention.
  • the method extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier.
  • a key attribute of the node attribute has a corresponding relationship with the node attribute information
  • the side information at least includes: an edge identifier and an edge attribute key
  • the edge attribute key has a correspondence with the edge attribute information Relationship; due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the connection between the node and the node is an edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related, and the above is stored. Extracted node information, node attribute information, side information, and edge attribute information. Since there is a connection between the extracted information, it is possible to quickly and accurately locate the required data when the data is subsequently manipulated.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that the researcher can focus on the node as the center.
  • the information stored in the solution provided by the embodiment of the present invention further includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • the method for storing data is provided in the first embodiment of the present invention, and the existing method is solved.
  • the redundancy problem exists in the original data set, and the solution provided by the embodiment of the invention has the advantages of flexible query, high efficiency, and flexible subject extraction.
  • the first embodiment of the present invention provides a method for storing data, which is more in line with the modeling requirements of a real social network, and is beneficial to the design of an efficient OLGP algorithm, and the model is convenient to convert to a traditional relational table, and is beneficial to people in the real world. understanding.
  • the connection between the node and the edge is established according to the connection between the node and the node, and therefore, the node information, the node attribute information, and the side information and the edge attribute information are directly connected. Therefore, the solution can realize the relationship between the nodes concerned by discovering the important relationship between the edges and the nodes, so that the changes to the prior art are small.
  • the embodiment of the present invention provides a method for storing data, which is similar to the method provided in the first embodiment, except that the method provided by the embodiment of the present invention is specifically applied in a research partner information network.
  • the network of research collaborators is a case of recording scientific research personnel in a field and publishing papers. It is a typical example of information networks. As shown in Figure 6, each node represents an author. If two people collaborate to publish an article, there is an edge between the two points. The attributes of the side record the number of articles published by the two collaborators at the time of the feature and at a specific meeting.
  • the following is an example of a partner network in the data collection of the ACM (Association for Computing Machinery) to elaborate and demonstrate the implementation process of the multidimensional information network data warehouse model.
  • the method includes:
  • Step 201 Acquire an original data set.
  • the original data set can be stored in unstructured text, which is not conducive to the user's efficient query analysis operation.
  • This solution extracts the acquired ACM data set, classifies and stores it, and can perform query analysis operations efficiently.
  • Step 202 Extract information representing a structure of the information network graph from the original data set.
  • the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information.
  • the node information includes at least: a node identifier. And a key attribute of the node attribute; the node attribute key has a corresponding relationship with the node attribute information; the side information at least includes: an edge identifier and an edge attribute key; the edge attribute key has a correspondence with the edge attribute information Relationship; where the edge is used to describe the relationship between the node and the node.
  • the relationship between the node and the node attribute Due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the connection between the node and the node is an edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related.
  • the extracted node information can be a node fact table (VFT, Vertex Fact)
  • the stored node information may include a node ID, a node attribute key, and may also include a metric of the node.
  • the node represents the author in the partner network.
  • the extracted side information can be stored in the Edge Fact Table (EFT, Edge Fact Table), and the storage side information can include: two author nodes idl, id2 (used to represent the edge identifier), and key attributes of the edge attribute (eg: paper key) ( Paper_key ), time key (Time_key ), and location key ( Venue—key), the side information can also include the measure of the edge.
  • EFT Edge Fact Table
  • the node attribute information is specific information corresponding to the node attribute key, and the node attribute information may be specifically stored in a Topology dimension Table (TDT), and the topology dimension table may have one or more.
  • TTT Topology dimension Table
  • the key of the node is the institution key (Institution — ID )
  • substitution — ID the institution key
  • the topology table is the name of the institution that all authors (ie nodes) have worked on.
  • IDTT Information Dimension Table
  • the above paper key (Paper_key), time key (Time key), and location key (Venue_key), the corresponding edge attribute information, specifically can be stored in the paper dimension table, time dimension table, location (Venue) Dimension table.
  • the information dimension table enables the collection to record the publication of the paper, the time of publication, the title of the paper, and the name of the paper.
  • the Paper dimension table can contain Paper_key, Paper_name.
  • Figure 8 shows Figure 7 is.
  • the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information.
  • the extracted information is stored, and the specific method of storing may store the corresponding information in the form of a table.
  • Step 203 Store extracted node information, node attribute information, side information, and edge attribute information, where the foregoing information is stored by using a node fact table, a topology dimension table, an edge fact table, and an information dimension table.
  • the edge fact table (EFT) of the partner network consists of the IDs of the author nodes (Author-id, Author2_id), and the key of each edge attribute (the specific information of the edge attributes is stored in the information dimension table) (Paper key, Time key, Venue key) and the metric (which can be: Co Frequence ).
  • Authorl id, Author2_id constitutes the primary key of the partner network side fact table, which can locate an edge (that is, can represent the edge identifier).
  • the connection between the fact table and each information dimension table can be done by Paper-key, Time key, Venue key.
  • One edge corresponds to an edge fact table.
  • the specific information carried in the side fact table can be represented by the edge fact table. As shown in FIG.
  • the edge fact table is converted into the edge fact table.
  • the table on the left side of FIG. 9 only identifies the header of the edge fact table, that is, The important information related to the edge in the side fact table, such as the edge identifier and the edge attribute key; in the table on the right side of Figure 9, the specific information of the edge identifier and the edge attribute key is located, or can be understood as the edge identifier and The specific value of the edge attribute key.
  • Paper the value of key is 1, indicating the specific information of the papers cooperating with the authors with values 0 and 1. See the information dimension table corresponding to the specific information of the paper with a value of 1;
  • the value of time—key is 1, indicating the specific time of the cooperation time of the authors with values 0 and 1. See the information dimension table corresponding to the time specific information corresponding to 1;
  • the paper—key, Time—key, and Venue—key are edge attribute keys included in the side information, and each value of the key corresponds to a specific information dimension table.
  • Co-frequence takes a value of 1, which is a measure of the edge included in the edge information.
  • the value is usually a specific value, that is, the value of Co-frequence is 1, which can be understood as the number of times the authors of the values 0 and 1 cooperate. 1 time.
  • the partner network node fact table may also include the metric value of the node by the node information (specifically, the node ID, or the author ID), and the key of the node attribute.
  • the node information includes: a node ID and/or an author ID, that is, the node information may be a single node ID, or may be a joint representation of the node ID and the author ID, or may be separately represented by the author ID.
  • the author ID ( Author- id ) uniquely represents a node as the primary key of the node fact table.
  • the key of the node attribute may be the primary key of the topology dimension table (the primary key may be understood as the subject information of the information recorded in the topology dimension table, for example, the primary key of the topological dimension organization (Institution_id) records the identifier of the organization, and the like. ), there can be multiple top-level tables, each of which can reflect a property of a node.
  • the metric of the node can be composed of the number of articles published by the author of the node (ie, Paper—Num), or it can have the metric of the node.
  • node fact table There is usually one node fact table.
  • the link between the node fact table and the topology dimension table can be implemented by the primary key of the topology dimension table (ie, Institution_id).
  • the specific information carried in the node fact table may be represented by a node fact relation table. As shown in FIG. 10, the node fact table is converted into a node fact relation table, and the table on the left side of FIG.
  • node fact table 10 only identifies the header of the node fact table, that is, Nodes of interest in the node fact table Relevant important information, such as the author's logo, the author's name, the name of the organization, the number of papers published by the author, etc.; and the specific information of the node identifier and the node attribute key is located in the table on the right side of Figure 10, or can be understood The specific value of the node ID and node attribute key.
  • the first row in the table on the right in Figure 9 has the author ID 0, the author name is Janwei Han, the code for organizing the organization name is 1, and the number of papers published by the author is 15.
  • the information dimension table (IDT) consists of a primary key that can identify the dimension table of the information (that is, the primary key is understood as the subject information of the information recorded in the information dimension table) and some related attributes of the information dimension table. There can be multiple information dimensions, and each dimension has a relational table associated with it, called a dimension table, which further describes the dimension.
  • the information dimension in the partner network includes the Paper dimension table, the Time dimension table, and the Venue dimension table.
  • the dimension table is set by the user according to the actual situation, or automatically generated and adjusted according to the data distribution.
  • the transformation of the information dimension relationship information dimension table is shown in Figure 11:
  • the Paper_key identifier is 1 uniquely identifies the paper_name as the FP tree, and the Paper-classify is the ap311 aper i record; in the same way, the Paper key identifier is 2 , 3, 4 have a similar understanding.
  • the Time-key identifier is 1 uniquely identified as the Time record of 1967 and the 1960s. Similarly, the Time key identifiers have similar understandings for 2, 3, and 4.
  • Venue The key identifier is 1 uniquely identifies Venue name as VLDB, and Venue—are is the Venue record of DB. Similarly, Venue key identifiers have similar understandings for 2, 3, and 4.
  • Topology determines the edge set and node set of the information network, that is, determines the topology of the graph in the information network. In turn, the size of the unit represented by the node is determined.
  • the topology of the partner network is the organization.
  • the Topological Dimension Table (TDT) consists of a primary key that uniquely identifies the topology dimension table and some related attributes of the topology dimension table. There can be more than one topological dimension table. The transformation of each topological dimension relationship topological dimension table is shown in Fig. 12, that is, the specific storage form in the topology dimension table can be stored in the relationship topological dimension table on the right side of Fig. 12.
  • V the set of points in the graph
  • E the set of edges
  • function f the side information determining function of graph G.
  • Set variable ID ⁇ I1, 12...Im ⁇ It is a set of dimensions to be investigated in OLGP.
  • the set of dimensions formed by the m information attributes can only determine the edge set of the graph, and cannot change the topological structure of the graph.
  • the ID is called the information dimension set.
  • the topology of the topological attributes determines the point set and edge set of the graph, thereby determining the topological structure of the graph, and calling TD a topological dimension set.
  • ROLGP EFT, VFT, S(IDT), S(TDT), F
  • EFT edge fact table
  • VFT is the node fact table
  • IDT is the information dimension table
  • TDT is the topology dimension table
  • F is the dependency set between the tables.
  • EFT is connected to EFT through a foreign key
  • TDT is connected to VFT through a foreign key
  • EFT is connected to VFT through node ID.
  • EFT, VFT, IDT, TDT satisfy the relationship table, that is, the following definitions are satisfied: R(U, D, Dom, F').R is the relational table, U is the set of attribute names that make up the relationship, and D is the attribute group. The domain from which attributes are derived from U, Dom is the set of attributes to the domain, and F' is the set of dependencies of the data between attributes.
  • OLGP-based information networks are modeled with fact tables and dimension tables.
  • the difference is that the fact table is composed of the edge fact table (EFT) and the node fact table (VFT), and the dimension table is composed of the information dimension table (IDT) and the topology dimension table (TDT).
  • the OLGP information network is modeled based on relational data.
  • the node and the edge are stored by the node fact table and the edge fact table respectively .
  • the attributes related to the edge fact table are stored by using the information dimension table, and the attributes related to the node are utilized by the topology dimension table. Store.
  • the method for storing data is provided by the second embodiment of the present invention.
  • the method extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier.
  • the node attribute key has a corresponding relationship with the node attribute information
  • the side information at least includes: an edge identifier and an edge attribute key
  • the edge attribute key has a correspondence with the edge attribute information Relationship; due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the connection between the node and the node is an edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related, and the above is stored. Extracted node information, node attribute information, side information, and edge attribute information. Due to the connection between the extracted information, This allows you to quickly and accurately locate the data you need when you subsequently manipulate the data.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that the researcher can focus on the node as the center.
  • the information stored in the solution provided by the embodiment of the present invention further includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • the connection between the node and the edge is established according to the connection between the node and the node, and therefore, the node information, the node attribute information, and the side information and the edge attribute information are directly established. Therefore, the solution can realize the relationship between the nodes concerned because the important relationship between the edges and the nodes is found, and the changes to the prior art are made smaller.
  • the subsequent query operation on the stored data is implemented very quickly and accurately.
  • the method may also include the following:
  • Step 204 Perform positioning on the stored node information, node attribute information, side information, and edge attribute information for the data to be queried; from the located node information, node attribute information, side information, or edge The query is made in one of the attribute information.
  • the query operation is performed from one of the determined information. Greatly narrows the scope of the query.
  • the partner network query the number of papers published in different conferences, because the storage method of the above steps 201 ⁇ 203 is used, in the multidimensional information network data warehouse model, involving the EFT and Venue tables (the address table in the information dimension table) , ie Venue table), the edge attribute key Venue_key in the EFT table and the information dimension table, that is, the Venue table, establish a connection relationship.
  • the specific query operation can be as follows:
  • SQL Structured Query Language
  • the method further comprises the following steps:
  • Step 205 based on the extracted node information, node attribute information, edge information, and the attribute information side, FIG online processing operation (OLGP, Online Graph Processing) 0
  • OLGP operations can include but are not limited to: Volume Up (I-OLGP), Top-Up Volume (T-OLGP), Asynchronous Roll Up, Drill Down, Slice, Cut, Pivot.
  • I-OLGP Volume Up
  • T-OLGP Top-Up Volume
  • Drill Down Drill Down
  • Slice Cut
  • Pivot Pivot
  • the information network can be uploaded to the partner network (I-OLGP), and the specific operation can be: performing year (year) ⁇ decade (decade) ⁇ all (all) at different levels in the time dimension of the information dimension Volume operations, from the number of papers published in different years to the number of papers published in different years, and then the number of papers published to all time.
  • I-OLGP partner network
  • the partner network can be topologically scrolled, and the specific operations can be: Performing the author's individual on the organizational dimension in the topology table (11 0 ⁇ Institution - All (all) Operation, from the cooperation between different authors to the cooperation between different institutions.
  • the scrolling operation can be understood as a generalization of low-level detail data to a high-level summary data in a certain dimension.
  • the information dimension time dimension
  • the volume is rolled up from the year to the age, the aggregated data of the age is obtained, and then the year is rolled up to all the years, and the aggregated data of all the years can be obtained.
  • the relationship between the extracted node information, the node attribute information, the side information, and the edge attribute information, so that when the stored information is subjected to an online map processing (OLGP) operation, different classified information may be used.
  • the processing is performed, for example, only the edge attribute information stored in the information dimension table is operated, or only the node attribute information stored in the topology table is operated, and the like.
  • multi-topic modeling can be performed by sharing the information dimension, and the underlying data can be reconstructed with little, and the existing dimension table can be shared as much as possible.
  • the keyword partner network since the keyword network and the partner network both include the Paper, Time, and Venue dimensions, the keyword partner network can be constructed by sharing the three information dimensions. As shown in Figure 13, the keyword fact table and the collaborator fact table share the Venue dimension, Paper dimension, and Time structure. Building a keyword-collaborator multidimensional information network data warehouse model.
  • the nodes displayed in the four columns on the left represent the term (Term), the side represents the situation between the semester and the semester, and the four columns on the left are executed.
  • the storage node information, the node attribute information, the side information, and the edge attribute information are performed in the operation of step 201 203 described above;
  • the nodes displayed on the right four columns represent the author, and the side represents the situation between the author and the author, and the left four columns are
  • the storage node information, the node attribute information, the side information, and the side attribute information are performed by performing the operations of steps 201 to 203 described above. That is, the topics stored on the left and right sides are different (the topics stored on the left are nodes for the semester, and the topics stored on the right are the nodes for the authors).
  • the middle Co-IDT can be used as the information dimension table in the left storage warehouse, or as the information dimension table of the right storage warehouse, that is, the multidimensional information network data warehouse sharing information dimension table on the left and the right sides, that is, the edges stored in the two warehouses.
  • the attribute information is the same.
  • multi-topic modeling can be performed by sharing the information dimension, and the underlying data can be reconstructed with little. Share existing dimension tables as much as possible.
  • the embodiment of the present invention provides a method for storing data, which is similar to the method provided by the foregoing embodiment, except that the method provided by the embodiment of the present invention is another storage method of a specific application. This storage method is applied to the movie actor cooperative network.
  • the film actor cooperative network is also a kind of information network.
  • the node identifies the actor, and the representative represents the cooperation relationship between the two actors.
  • the movie actor cooperation network is shown in Figure 14.
  • the node description includes: actor name, gender, age, affiliated film company; side descriptions include: movie name, release time.
  • the method includes:
  • Step 301 obtaining the original data set, for the original data set of the movie actor cooperative network, usually the name of the messy actor, the gender, the name of the movie, the time of the release, and the like, disorderly and disorderly. Not easy to find, as well as OLGP operations.
  • Step 302 Extract information indicating a structure of the information network graph from the original data set.
  • the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information.
  • the node information includes at least: a node identifier.
  • the key of the node attribute; the node attribute is off
  • the key code has a corresponding relationship with the node attribute information;
  • the side information includes at least: an edge identifier and an edge attribute key;
  • the edge attribute key has a corresponding relationship with the edge attribute information; and the edge is used to describe between the node and the node Contact.
  • the edge is used to describe the relationship between the node and the node, and the extracted node information, node attribute information, side information, and edge can be easily represented by the graph structure.
  • the link between attribute information is used to describe the relationship between the node and the node, and the extracted node information, node attribute information, side information, and edge.
  • the extracted node information may be stored in a node fact table (VFT, Vertex Fact Table), where the node information may include a node ID, a node attribute key, and may also include a metric of the node.
  • VFT node fact table
  • the node ID is the actor (Actor_id) and the actor's name
  • the node attribute key is the actor's company code (Film Comany id)
  • the node's metric is the number of actors' films (Film-Num).
  • the extracted side information can be stored in the Edge Fact Table (EFT, Edge Fact Table), and the stored side information can include: two actor nodes idl, id2 (used to represent the edge identification), and key attributes of the edge attribute (eg: Cooperative movie key
  • the code Frm-key
  • the release time key Release-Time-key
  • the side information can also include the measure of the edge (ie Co- Frequence).
  • the node attribute information is specific information corresponding to the node attribute key, and the node attribute information may be specifically stored in a Topology dimension Table (TDT), and the topology dimension table may have one or more.
  • TTT Topology dimension Table
  • the key of the node in the node information is Film Comany lD
  • the name of the movie company to which the actor (ie node) belongs can be stored in the topology table.
  • the edge attribute information is specific information corresponding to the edge attribute key, and the side attribute information may be stored in an Information Dimension Table (IDT).
  • IDCT Information Dimension Table
  • the above-mentioned cooperative movie key (Film-key), the release time key (Release-Time-key), and the corresponding edge attribute information which may be separately stored in the movie dimension table and the release time dimension table.
  • the film dimension records the movie name, movie type and other information; the release time dimension record records the year, the age and other information.
  • Step 303 Store extracted node information, node attribute information, side information, and edge attribute information, where the foregoing information is stored by using a node fact table, a topology dimension table, an edge fact table, and an information dimension table.
  • the node fact table, the topology table, the side fact table, and the information dimension table are combined to form a multi-dimensional information network data warehouse model.
  • the method for storing data is provided by the third embodiment of the present invention. The method extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier.
  • a key attribute of the node attribute has a corresponding relationship with the node attribute information
  • the side information at least includes: an edge identifier and an edge attribute key
  • the edge attribute key has a correspondence with the edge attribute information Relationship; due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the connection between the node and the node is an edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related, and the above is stored. Extracted node information, node attribute information, side information, and edge attribute information. Since there is a connection between the extracted information, when the data is subsequently operated, the required data can be quickly and accurately located.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that the researcher can focus on the node as the center.
  • the information stored in the solution provided by the embodiment of the present invention further includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • the connection between the node and the edge is established according to the connection between the node and the node, and therefore, the node information, the node attribute information, and the side information and the edge attribute information are directly established. Therefore, the solution can realize the relationship between the nodes concerned because the important relationship between the edges and the nodes is found, and the changes to the prior art are made smaller.
  • the subsequent query operation on the stored data is implemented very quickly and accurately.
  • the method may also include the following:
  • Step 304 Perform positioning on the stored node information, node attribute information, side information, and edge attribute information for the data to be queried.
  • the query is performed from one of the node information, the node attribute information, the side information, or the side attribute information after the positioning. That is, when the data to be queried is judged, it belongs to the node information, or the node attribute information, or the side information, or the edge attribute information; the query is performed in the positioned information, and the scope of the narrowed query is performed.
  • SQL Structured Query Language
  • step 304 when querying the data to be queried, in the edge fact table, the node fact table, the information dimension table, and the topology dimension table in the multidimensional information network data warehouse, it can be determined that the required query information should belong to the above table.
  • One or more of them can eliminate a large amount of information redundancy, be efficient and save time. Queries for specific problems involve only the connection operations of some tables.
  • the method further comprises the following steps:
  • Step 305 according to the extracted node information between the node attribute information, edge information, and the attribute information having contact side, FIG online processing operation (OLGP, Online Graph Processing) 0 wherein, OLGP operations may include but are not limited to: Rollup (I-OLGP), Topology Volume (T-OLGP), Asynchronous Roll Up, Drill Down, Slice, Cut, Pivot.
  • OLGP Online Graph Processing
  • the information network can be uploaded to the partner network (I-OLGP), and the specific operation can be: performing year (year) ⁇ decade (decade) ⁇ all (all) at different levels in the time dimension of the information dimension Volume operations, from the number of movies released in different years to the number of movies released in different years, and then the number of movies released to all times.
  • I-OLGP partner network
  • the partner network can be topologically scrolled, and the specific operations can be: Performing an actor (Actor) ⁇ Film Company (All) (all) in the organizational dimension of the topology table. Roll-up operation, from the cooperation relationship between different actors to the cooperation relationship between different film companies.
  • the relationship between the extracted node information, the node attribute information, the side information, and the edge attribute information, so that when the stored information is subjected to an online map processing (OLGP) operation, different classified information may be used.
  • the processing is performed, for example, only the edge attribute information stored in the information dimension table is operated, or only the node attribute information stored in the topology table is operated, and the like.
  • multi-topic modeling can be performed by sharing the information dimension, and the underlying data can be reconstructed with little, and the existing dimension table can be shared as much as possible.
  • An embodiment of the present invention provides a data storage device. As shown in FIG. 17, the device includes: an obtaining unit 401, an extracting unit 402, and a storage unit 403;
  • the obtaining unit 401 is configured to acquire an original data set.
  • the original data set can be understood as a collection of all the data collected by the user, which is messy and unfavorable for analysis.
  • the raw data set obtained in the acquisition unit may be raw data of unstructured text input into the execution device.
  • the extracting unit 402 is configured to extract information indicating a structure of the information network graph from the original data set, where the information indicating the structure of the information network graph includes at least: node information, node attribute information, side information, and edge attribute information;
  • the node information includes at least: a node identifier and a node attribute key; the node attribute key has a corresponding relationship with the node attribute information;
  • the side information at least includes: an edge identifier and an edge attribute key; the edge attribute The key has a corresponding relationship with the edge attribute information; the edge is used to describe a relationship between the node and the node;
  • the relationship between the node and the node attribute Due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the edge is used to describe the relationship between the node and the node, and the extracted node information, node attribute information, side information, and edge can be easily represented by the graph structure.
  • the relationship between the attribute information see Figure 5, Figure 8, Figure 16 above).
  • the storage unit 403 is configured to store the extracted node information, node attribute information, side information, and edge attribute information.
  • the storage unit stores the extracted node information, node attribute information, side information, and edge attribute information, which may be stored in the form of a table, that is, by: a node fact table, a topology dimension table, an edge fact table, and an information dimension table.
  • a node fact table a topology dimension table
  • an edge fact table an edge fact table
  • the above information is stored correspondingly.
  • the storage in the form of a table is a factual manner, and is not a limitation of the embodiment of the present invention.
  • the specific storage form may have other forms.
  • the device for storing data is provided by the first embodiment of the present invention.
  • the device extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier.
  • a key attribute of the node attribute has a corresponding relationship with the node attribute information
  • the side information at least includes: an edge identifier and an edge attribute key
  • the edge attribute key has a correspondence with the edge attribute information Relationship; due to node and node properties
  • the relationship between the edge and the edge attribute, the connection between the node and the node is the edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related, and the extracted node information, the node attribute are stored.
  • Information, side information, and side attribute information Since there is a connection between the extracted information, when the data is subsequently operated, the required data can be quickly and accurately located.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that the researcher can focus on the node as the center.
  • the information stored in the solution provided by the embodiment of the present invention further includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • the apparatus for storing data is provided in the first embodiment of the present invention, and the existing
  • the solution provided by the embodiment of the present invention has the advantages of flexible query, high efficiency, and flexible subject extraction.
  • the first embodiment of the present invention provides a device for storing data, which is more in line with the modeling requirements of a real social network, and is beneficial to efficient OLGP algorithm design, and the model is convenient to convert to a traditional relation table, and is beneficial to people in real world information. understanding.
  • the connection between the node and the edge is established according to the connection between the node and the node, and therefore, the node information, the node attribute information, and the side information and the edge attribute information are directly connected. Therefore, the solution can realize the relationship between the nodes concerned by discovering the important relationship between the edges and the nodes, so that the changes to the prior art are small.
  • the node information further includes: a node metric value; the side information further includes: an edge metric value.
  • the extracted node information is stored in a node fact table
  • the extracted side information is stored in an edge fact table
  • the extracted node attribute information is stored in a topology dimension table
  • the extracted edge attribute information is stored in the information dimension table
  • the node fact table Since the edge is used to describe a relationship between a node and a node, the node fact table has a relationship with the edge fact table;
  • the node attribute key has a corresponding relationship with the node attribute information; and the topology dimension table has a relationship with the node fact table;
  • the information dimension table has a relationship with the edge fact table because the edge attribute key has a corresponding relationship with the edge attribute information.
  • the device further includes: a positioning unit 404, and a query unit 405;
  • the locating unit 404 is configured to locate, in the stored node information, node attribute information, side information, and edge attribute information, the data that needs to be queried;
  • the query unit 405 is configured to perform query from one of the node information, the node attribute information, the side information, or the edge attribute information after the positioning.
  • the edge fact table, the node fact table, the information dimension table, and the topology dimension table in the multidimensional information network data warehouse can determine the need.
  • the query information should belong to one or more of the above tables, so that a large amount of information redundancy can be eliminated, the query is efficient, and time is saved. Queries for specific problems involve only the join operations of some tables.
  • the apparatus further includes: a map processing unit 406;
  • the map processing unit 406 is configured to perform an online map processing operation according to the extracted node information, node attribute information, side information, and edge attribute information.
  • the online map processing operation in the map processing unit 406 at least includes: scrolling (I-OLGP), topological volume (T-OLGP), asynchronous scrolling ), drill down, slice, diced, one of the data views.
  • the information dimension rollup specifically includes:
  • the information of one attribute of the edge stored in the information dimension table, or the information of one or more attributes is scrolled.
  • the topology dimension aggregation operation specifically includes:
  • the information of one attribute of the node stored in the topology table or the information of one or more attributes is scrolled.
  • the relationship between the extracted node information, the node attribute information, the side information, and the edge attribute information therefore, when performing online graph processing (OLGP) operations on the stored information,
  • OLGP online graph processing
  • the same classified information is processed, for example, only the edge attribute information stored in the information dimension table is operated, or only the node attribute information stored in the topology table is operated, and the like.
  • multi-topic modeling can be performed by sharing the information dimension, and the underlying data can be reconstructed with little, and the existing dimension table can be shared as much as possible.
  • the embodiment of the present invention provides a data storage device.
  • the device includes: a memory 40, a processor 41, an input device 43, and an output device 44 respectively connected to a bus, wherein the memory 40 is Used to store data input from the input device 43, and also to store the processor
  • the processor 41 is configured to extract information indicating a structure of the information network map from the original data set, where the information indicating the structure of the information network map includes at least: node information, node attribute information, side information, and edge attribute information;
  • the node information includes at least: a node identifier and a node attribute key; the node attribute key has a corresponding relationship with the node attribute information;
  • the side information includes at least: an edge identifier and an edge attribute key; the edge attribute The key has a corresponding relationship with the edge attribute information; the edge is used to describe a relationship between the node and the node;
  • the memory 40 is further configured to store the extracted node information, node attribute information, side information, and edge attribute information.
  • the device for storing data is provided by the first embodiment of the present invention.
  • the device extracts node information, node attribute information, side information, and edge attribute information from the original data set by acquiring the original data set.
  • the node information includes at least: a node identifier.
  • a key attribute of the node attribute has a corresponding relationship with the node attribute information
  • the side information at least includes: an edge identifier and an edge attribute key
  • the edge attribute key has a correspondence with the edge attribute information Relationship; due to the relationship between the node and the node attribute, the relationship between the edge and the edge attribute, the connection between the node and the node is an edge, so that the extracted node information, the node attribute information, the side information, and the edge attribute information are related, and the above is stored. Extracted node information, node attribute information, side information, and edge attribute information. Since there is a connection between the extracted information, when the data is subsequently operated, the required data can be quickly and accurately located.
  • the information stored by the solution provided by the embodiment of the present invention includes not only the same node information and node attribute information as the prior art, so that researchers can pay attention to The node-centered fact, and the information stored in the solution provided by the embodiment of the present invention also includes side information and edge attribute information that cannot be focused on by the prior art, so that the researcher can also pay attention to the relationship between the nodes.
  • an apparatus for storing data is provided in the first embodiment of the present invention, which solves the redundancy problem in the original data set in the existing OLAP multi-dimensional data warehouse model.
  • the solution provided by the embodiment of the present invention has flexible query, high efficiency, and theme. The advantages of extraction flexibility.
  • the first embodiment of the present invention provides a device for storing data, which is more in line with the modeling requirements of a real social network, and is beneficial to efficient OLGP algorithm design, and the model is convenient to convert to a traditional relation table, and is beneficial to people in real world information. understanding.
  • the connection between the node and the edge is established according to the connection between the node and the node, and therefore, the node information, the node attribute information, and the side information and the edge attribute information are directly connected. Therefore, the solution can realize the relationship between the nodes concerned by discovering the important relationship between the edges and the nodes, so that the changes to the prior art are small.
  • the node information processed by the processor 41 further includes: a node metric value; the side information further includes: an edge metric value.
  • the extracted node information in the processor 41 is stored in a node fact table; the extracted side information is stored in an edge fact table; the extracted node attribute information is stored in a topology dimension table; the extracted edge The attribute information is stored in the information dimension table; since the edge is used to describe a relationship between the node and the node, the node fact table has a relationship with the edge fact table; the node attribute key and the node attribute The information has a correspondence relationship; the topology dimension table is associated with the node fact table; and the information dimension table has a relationship with the edge fact table because the edge attribute key has a corresponding relationship with the edge attribute information. .
  • the processor 41 is further configured to: in the stored node information, the node attribute information, the side information, and the edge attribute information, the data to be queried; the node information after the positioning, the node Query in one of attribute information, side information, or edge attribute information.
  • the processor 41 is further configured to perform an online map processing operation according to the extracted node information, node attribute information, side information, and edge attribute information.
  • the processor 41 is further configured to: the online map processing operation at least:
  • I-OLGP Injection Graphography
  • T-OLGP T-OLGP
  • Asynchronous One of the volumes (I-OLGP, T-OLGP, Asynchronous), drill down, slice, dicing, and pivot.
  • the processor 41 is further configured to: if the extracted edge attribute information is stored in the information dimension table, the information dimension rollup specifically includes:
  • the information of one attribute of the edge stored in the information dimension table, or the information of one or more attributes is scrolled.
  • the processor 41 is further configured to: if the extracted node attribute information is stored in the topology dimension table, the topology dimension aggregation operation specifically includes:
  • the information of one attribute of the node stored in the topology table or the information of one or more attributes is scrolled.
  • the relationship between the extracted node information, the node attribute information, the side information, and the edge attribute information, so that when the stored information is subjected to an online map processing (OLGP) operation, different classified information may be used.
  • the processing is performed, for example, only the edge attribute information stored in the information dimension table is operated, or only the node attribute information stored in the topology table is operated, and the like.
  • multi-topic modeling can be performed by sharing the information dimension, and the underlying data can be reconstructed with little, and the existing dimension table can be shared as much as possible.
  • the medium can be a read only memory, a magnetic disk or a compact disk or the like.

Abstract

L'invention concerne un procédé et un dispositif de mémorisation de données. Le procédé comprend un procédé de mémorisation de données, et le procédé comprend : l'acquisition d'un ensemble de données d'origine ; l'extraction d'informations représentant une structure de schéma de réseau d'informations de l'ensemble de données d'origine, les informations représentant la structure de schéma de réseau d'informations comprenant au moins des informations de noeud, des informations d'attribut de noeud, des informations de côté et des informations d'attribut de côté, les informations de noeud comprenant au moins un identifiant de noeud et un code touche d'attribut de noeud, le code touche d'attribut de noeud et les informations d'attribut de noeud ayant une corrélation ; les informations de côté comprenant au moins un identifiant de côté et un code touche d'attribut de côté, le code touche d'attribut de côté et les informations d'attribut de côté ayant une corrélation ; et un côté étant utilisé pour décrire la relation entre les noeuds ; et la mémorisation des informations de noeud extraites, des informations d'attribut de noeud, des informations de côté et des informations d'attribut de côté. Au moyen de la solution fournie dans les modes de réalisation de la présente invention, des chercheurs peuvent également observer la relation entre des noeuds.
PCT/CN2014/075570 2013-10-23 2014-04-17 Procédé et dispositif de mémorisation de données WO2015058500A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310505069.5 2013-10-23
CN201310505069.5A CN104572740B (zh) 2013-10-23 2013-10-23 一种存储数据的方法和装置

Publications (1)

Publication Number Publication Date
WO2015058500A1 true WO2015058500A1 (fr) 2015-04-30

Family

ID=52992190

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/075570 WO2015058500A1 (fr) 2013-10-23 2014-04-17 Procédé et dispositif de mémorisation de données

Country Status (2)

Country Link
CN (1) CN104572740B (fr)
WO (1) WO2015058500A1 (fr)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325756B (zh) * 2015-06-15 2020-04-24 阿里巴巴集团控股有限公司 一种数据存储、数据计算方法和设备
CN110019357B (zh) * 2017-09-29 2021-06-29 北京国双科技有限公司 数据库查询脚本生成方法及装置
CN109446362B (zh) * 2018-09-05 2021-07-23 深圳神图科技有限公司 基于外存的图数据库结构、图数据存储方法、装置
CN110737805B (zh) * 2019-10-18 2022-07-19 网易(杭州)网络有限公司 图模型数据的处理方法、装置和终端设备
CN110933101B (zh) * 2019-12-10 2022-11-04 腾讯科技(深圳)有限公司 安全事件日志处理方法、装置及存储介质
CN112948447A (zh) * 2020-12-28 2021-06-11 福建票付通信息科技有限公司 一种基于网状结构的用户信息高效检索方法
CN114077680B (zh) * 2022-01-07 2022-05-17 支付宝(杭州)信息技术有限公司 一种图数据的存储方法、系统及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080288524A1 (en) * 2007-05-18 2008-11-20 Microsoft Corporation Filtering of multi attribute data via on-demand indexing
US20090248715A1 (en) * 2008-03-31 2009-10-01 Microsoft Corporation Optimizing hierarchical attributes for olap navigation
CN102982103A (zh) * 2012-11-06 2013-03-20 东南大学 一种olap海量多维数据维存储方法
CN103164222A (zh) * 2013-02-25 2013-06-19 用友软件股份有限公司 多维建模系统和多维建模方法
CN103235793A (zh) * 2013-04-01 2013-08-07 华为技术有限公司 联机处理数据的方法、设备及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093495B (zh) * 2006-06-22 2011-08-17 国际商业机器公司 基于网状关系维的数据处理方法和系统
US7774227B2 (en) * 2007-02-23 2010-08-10 Saama Technologies, Inc. Method and system utilizing online analytical processing (OLAP) for making predictions about business locations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080288524A1 (en) * 2007-05-18 2008-11-20 Microsoft Corporation Filtering of multi attribute data via on-demand indexing
US20090248715A1 (en) * 2008-03-31 2009-10-01 Microsoft Corporation Optimizing hierarchical attributes for olap navigation
CN102982103A (zh) * 2012-11-06 2013-03-20 东南大学 一种olap海量多维数据维存储方法
CN103164222A (zh) * 2013-02-25 2013-06-19 用友软件股份有限公司 多维建模系统和多维建模方法
CN103235793A (zh) * 2013-04-01 2013-08-07 华为技术有限公司 联机处理数据的方法、设备及系统

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NIB, ZHANGYAN ET AL.: "Design of Multi-Dimensional Information Network Datawarehouse Model for Online Graph Processing. CNKI digital publishing platform", JOURNAL OF FRONTIERS OF COMPUTER SCIENCE AND TECHNOLOGY, 30 August 2013 (2013-08-30), pages 51 - 60 *
NIE, ZHANGYAN ET AL.: "Design of Multi-Dimensional Information Network Datawarehouse Model for Online Graph Processing. CNKI digital publishing platform", JOURNAL OF FRONTIERS OF COMPUTER SCIENCE AND TECHNOLOGY, 30 August 2013 (2013-08-30), pages 51 - 60 *
XU, HONGYU ET AL.: "On-Line Graphic Processing: Information Network Oriented On-Line Analytical Processing.", JOURNAL OF FRONTIERS OF COMPUTER SCIENCE AND TECHNOLOGY, vol. 9, 2012, pages 797 - 809 *

Also Published As

Publication number Publication date
CN104572740A (zh) 2015-04-29
CN104572740B (zh) 2019-09-13

Similar Documents

Publication Publication Date Title
WO2015058500A1 (fr) Procédé et dispositif de mémorisation de données
Moniruzzaman et al. Nosql database: New era of databases for big data analytics-classification, characteristics and comparison
CN101201822B (zh) 基于内容的视频镜头检索方法
Ribeiro et al. Data modeling and data analytics: a survey from a big data perspective
CN104850601B (zh) 基于图数据库的警务实时分析应用平台及其构建方法
US9785725B2 (en) Method and system for visualizing relational data as RDF graphs with interactive response time
Mohammed et al. A review of big data environment and its related technologies
Ahmed et al. A literature review on NoSQL database for big data processing
Cuzzocrea et al. Semantics-aware advanced OLAP visualization of multidimensional data cubes
CN113535788A (zh) 一种面向海洋环境数据的检索方法、系统、设备及介质
Hashem et al. Evaluating NoSQL document oriented data model
Kanchi et al. Challenges and Solutions in Big Data Management--An Overview
Wang et al. TSMH Graph Cube: A novel framework for large scale multi-dimensional network analysis
Vazhkudai et al. Constellation: A science graph network for scalable data and knowledge discovery in extreme-scale scientific collaborations
Kang et al. Distributed graph cube generation using Spark framework
Jakawat et al. Olap on information networks: A new framework for dealing with bibliographic data
Shakhovska et al. Big Data Model" Entity and Features"
Suri et al. A comparative study between the performance of relational & object oriented database in Data Warehousing
Ghrab et al. Topograph: an end-to-end framework to build and analyze graph cubes
CN112765490A (zh) 一种基于知识图谱和图卷积网络的信息推荐方法及系统
CN115309789B (zh) 一种基于业务对象智能动态化实时生成关联数据图的方法
Akid et al. Towards NoSQL graph data warehouse for big social data analysis
Jakawat et al. Graphs enriched by cubes for OLAP on bibliographic networks
CN111399838A (zh) 一种基于SparkSQL和物化视图的数据建模方法及装置
Ahmed et al. A study of big data and classification of nosql databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14855623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14855623

Country of ref document: EP

Kind code of ref document: A1