CN107818117B

CN107818117B - Data table establishing method, online query method and related device

Info

Publication number: CN107818117B
Application number: CN201610826949.6A
Authority: CN
Inventors: 王平; 孙权
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-09-14
Filing date: 2016-09-14
Publication date: 2022-02-15
Anticipated expiration: 2036-09-14
Also published as: CN107818117A

Abstract

The application provides a data table establishing method, an online query method and a related device, wherein the establishing method comprises the following steps: establishing a first data table on a physical cluster, wherein the first data table is used for storing the relationship information of a first node and a second node on line; establishing a second data table on the physical cluster, wherein the second data table is used for storing attribute information of a second node on line; wherein the first node and the second node are of different node types. In the embodiment of the present invention, node types are not distinguished any more when a data table is established, so that even if node types of a first node and a second node are different, that is, the first data table and the second data table correspond to different node types, the first data table and the second data table are still established on the same physical cluster. Therefore, a plurality of physical clusters do not need to be accessed during online query, and the online query speed is improved.

Description

Data table establishing method, online query method and related device

Technical Field

The present application relates to the field of online storage technologies, and in particular, to a method for establishing a data table, an online query method, and a related device.

Background

With the continuous development of internet technology, the generated relational network data is not only more and more in variety but also more and more in magnitude, for example, the relational network data shown in fig. 1 includes two types of nodes, one is a user node: user i and user j, the other is a commodity node: item 1 and item 2, where user j is a friend of user i, and user j purchases item 1 and item 2. How to store the relational network data through an online storage technology becomes a problem which is more and more concerned.

Currently, these data are usually stored online by building a data table. For example, facebook corporation uses unicorn architecture to build data tables. In the unicorn architecture, each node type corresponds to a separate physical cluster, that is, a data table of each node type is established in the corresponding physical cluster. For example, as shown in fig. 2, a data table of user node types is established in the user physical cluster, and the data table is used for storing attribute information of user nodes, a buddy list, a commodity purchase list, and the like.

It can be seen that the data table is established by using the unicorn architecture, and when data of different node types are involved in one online query process, a plurality of physical clusters need to be accessed. For example, when a product purchased by a friend of user i needs to be queried online, the user cluster needs to be accessed first, the friend of user i is queried to be user j, and a product list { i, j } purchased by user j is queried, then the product cluster is accessed, and attribute information of products i and j is queried to serve as a final online query result. Obviously, the above online query process requires at least access to at least two physical clusters, resulting in a slower online query speed.

Disclosure of Invention

The technical problem to be solved by the application is to provide a data table establishing method, an online query method and a related device, wherein the data table is established more reasonably, so that a plurality of physical clusters are not required to be accessed during online query, and the online query speed is increased.

Therefore, the technical scheme for solving the technical problem is as follows:

the application provides a method for establishing a data table, which comprises the following steps:

establishing a first data table on a physical cluster, wherein the first data table is used for storing the relationship information of a first node and a second node on line;

establishing a second data table on the physical cluster, wherein the second data table is used for storing attribute information of a second node on line;

wherein the first node and the second node are of different node types.

Optionally, the first data table includes a first index entry and a second index entry, where the first index entry is used to store the identifier of the first node online, and the second index entry is used to store the identifier of the second node corresponding to the first node online;

the second data table comprises a third index item and a first attribute item, wherein the third index item is used for storing the identification of the second node online, and the first attribute item is used for storing the attribute information of the second node online.

Optionally, the first data table further includes a second attribute item, where the second attribute item is used to store attribute information of a corresponding relationship between the first node and the second node on line.

Optionally, the first data table is a key-key-value structure, where the first index item is primary key information, the second index item is secondary key information, and the second attribute item is value information;

the second data table is a key-value structure, wherein the third index item is key information, and the first attribute item is value information.

Optionally, the physical cluster includes N storage partitions, where N is greater than or equal to 2; the establishing method further comprises the following steps:

determining a first partition from the N storage partitions according to the first index entry, and determining a second partition from the N storage partitions according to the third index entry;

the establishing of the first data table on one physical cluster comprises: establishing a first data table on a first partition of the physical cluster;

the establishing a second data table on the physical cluster includes: a second data table is established on a second partition of the physical cluster.

Optionally, the first partition and the second partition respectively include M backup areas, where M is greater than or equal to 2;

establishing a first data table on a first partition of the physical cluster, comprising: respectively establishing the first data tables on M backup areas of a first partition of the physical cluster;

establishing a second data table on a second partition of the physical cluster, comprising: and respectively establishing the second data tables on the M backup areas of the second partition of the physical cluster.

Optionally, the first data table is further configured to store relationship information between a third node and a fourth node on line; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type;

alternatively, the method further comprises: and establishing a third data table on the physical cluster, wherein the third data table is used for storing the relationship information of a fifth node and the first node on line.

The application provides an online query method, wherein a first data table and a second data table are established on a physical cluster, and the first data table is used for storing relationship information of a first node and a second node on line; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types; the method comprises the following steps:

receiving an online query request, wherein the online query request is used for indicating online query of relationship data of a first node;

accessing the first data table and the second data table in the physical cluster on line, and inquiring the relation data of the first node;

wherein the querying of the relationship data to the first node comprises: and querying a second node corresponding to the first node from the first data table, and querying attribute information of the second node from the second data table.

Optionally, the physical cluster includes N storage partitions, where N is greater than or equal to 2; the method further comprises the following steps:

determining a first partition from the N storage partitions according to the first index entry, and determining a second partition from the N storage partitions according to the second index entry;

accessing the first data table and the second data table in the physical cluster online, comprising: accessing online the first data table on the first partition and the second data table on the second partition in the physical cluster.

Optionally, the first data table is further configured to store relationship information between a third node and a fourth node on line; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type; the online query request is also used for indicating the relationship data of a third node to be queried online;

the method further comprises the following steps:

accessing the first data table and the second data table in the physical cluster on line, and inquiring relation data of the third node;

wherein the querying of the relationship data to the third node comprises: inquiring a fourth node corresponding to the third node from the first data table; and inquiring the attribute information of the fourth node from the second data table.

Optionally, the online query request includes a first composite operator; the method further comprises the following steps:

analyzing a first instruction, a second instruction and a third instruction from the first composite operator, wherein the first instruction is used for indicating the relation data of the first node to be inquired online, the second instruction is used for indicating the relation data of the third node to be inquired online, and the third instruction is used for indicating the comprehensive processing of the relation data;

querying the relationship data of the first node, comprising: querying relationship data of the first node based on the first instruction;

querying the relationship data of the third node, comprising: querying the relationship data of the third node based on the second instruction;

the method further comprises the following steps: and comprehensively processing the relationship data of the first node and the relationship data of the third node based on the third instruction.

Optionally, the comprehensive treatment includes any one of the following treatments: merging, intersection, and difference operations.

Optionally, a third data table is further established on the physical cluster, and the third data table is used for storing relationship information between a fifth node and the first node on line; the online query request is used for indicating the relationship data of the fifth node to be queried online:

online accessing the first data table and the second data table in the physical cluster, and querying relationship data of the first node, including: accessing the first data table, the second data table and the third data table in the physical cluster on line to inquire the relation data of the fifth node;

querying relationship data of the fifth node, comprising: and querying the first node corresponding to the fifth node from the third data table, and querying the relation data of the first node.

Optionally, the online query request includes a second composite operator; the method further comprises the following steps:

analyzing a fourth instruction and a fifth instruction from the first composite operator, wherein the fourth instruction is used for indicating a node corresponding to the fifth node to be queried online, and the fifth instruction is used for indicating relationship data of the node corresponding to the fifth node to be queried online;

inquiring the first node corresponding to the fifth node from the third data table, including: based on the fourth instruction, the first node corresponding to the fifth node is inquired from the third data table;

querying the relationship data of the first node, comprising: and querying the relation data of the first node based on the fifth instruction.

The application provides a data table's establishment device includes:

the system comprises a first establishing unit, a second establishing unit and a third establishing unit, wherein the first establishing unit is used for establishing a first data table on a physical cluster, and the first data table is used for storing the relationship information of a first node and a second node on line;

a second establishing unit, configured to establish a second data table on the physical cluster, where the second data table is used to store attribute information of a second node online;

wherein the first node and the second node are of different node types.

Optionally, the physical cluster includes N storage partitions, where N is greater than or equal to 2; the establishing device further comprises:

a determining unit, configured to determine a first partition from the N storage partitions according to the first index entry, and determine a second partition from the N storage partitions according to the third index entry;

when the first data table is established on one physical cluster, the first establishing unit is specifically configured to establish the first data table on a first partition of the physical cluster;

when the second data table is established on the physical cluster, the second establishing unit is specifically configured to establish the second data table on a second partition of the physical cluster.

when a first data table is established on the first partition of the physical cluster, the first establishing unit is specifically configured to establish the first data table on M backup areas of the first partition of the physical cluster respectively;

when a second data table is established on the second partition of the physical cluster, the second establishing unit is specifically configured to establish the second data table on M backup areas of the second partition of the physical cluster, respectively.

or, the establishing means further comprises: and the third establishing unit is used for establishing a third data table on the physical cluster, and the third data table is used for storing the relationship information between a fifth node and the first node on line.

The application provides an online query device.A first data table and a second data table are established on a physical cluster, and the first data table is used for storing the relationship information of a first node and a second node on line; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types; the device comprises:

the system comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving an online query request which is used for indicating the online query of the relationship data of a first node;

the query unit is used for accessing the first data table and the second data table in the physical cluster on line and querying the relation data of the first node;

Optionally, the physical cluster includes N storage partitions, where N is greater than or equal to 2; the device further comprises:

a determining unit, configured to determine a first partition from the N storage partitions according to the first index entry, and determine a second partition from the N storage partitions according to the second index entry;

when the first data table and the second data table in the physical cluster are accessed online, the query unit is specifically configured to: accessing online the first data table on the first partition and the second data table on the second partition in the physical cluster.

the query unit is further configured to access the first data table and the second data table in the physical cluster online and query the relationship data of the third node;

Optionally, the online query request includes a first composite operator; the device further comprises:

the analysis unit is used for analyzing a first instruction, a second instruction and a third instruction from the first composite operator, wherein the first instruction is used for indicating on-line query of the relation data of the first node, the second instruction is used for indicating on-line query of the relation data of the third node, and the third instruction is used for indicating comprehensive processing of the relation data;

when the relationship data of the first node is queried, the querying unit is specifically configured to query the relationship data of the first node based on the first instruction;

when the relationship data of the third node is queried, the querying unit is specifically configured to query the relationship data of the third node based on the second instruction;

the device further comprises: and the processing unit is used for carrying out comprehensive processing on the relationship data of the first node and the relationship data of the third node based on the third instruction.

the query unit is specifically configured to, when the first data table and the second data table in the physical cluster are accessed online and the relationship data of the first node is queried, access the first data table, the second data table, and the third data table in the physical cluster online and query the relationship data of the fifth node;

when the relationship data of the fifth node is queried, the querying unit is specifically configured to query the first node corresponding to the fifth node from the third data table, and query the relationship data of the first node.

Optionally, the online query request includes a second composite operator; the device further comprises:

the analysis unit is used for analyzing a fourth instruction and a fifth instruction from the first composite operator, wherein the fourth instruction is used for indicating a node corresponding to the fifth node to be queried online, and the fifth instruction is used for indicating relationship data of the node corresponding to the fifth node to be queried online;

when the first node corresponding to the fifth node is queried from the third data table, the querying unit is specifically configured to query the first node corresponding to the fifth node from the third data table based on the fourth instruction;

when the relationship data of the first node is queried, the querying unit is specifically configured to query the relationship data of the first node based on the fifth instruction.

According to the technical scheme, node types are not distinguished when the data table is established, so that even if the node types of a first node and a second node are different, namely the first data table and the second data table correspond to different node types, the first data table and the second data table are still established on the same physical cluster. Therefore, a plurality of physical clusters do not need to be accessed during online query, and the online query speed is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a schematic diagram of an attribute diagram;

FIG. 2 is a schematic diagram of a unicorn architecture;

FIG. 3 is a schematic flow chart diagram illustrating one embodiment of a method provided herein;

fig. 4 is a schematic structural diagram of a cluster architecture provided in the present application;

FIG. 5 is a schematic flow chart diagram of another embodiment of a method provided herein;

FIG. 6 is a schematic diagram of an embodiment of an apparatus provided herein;

fig. 7 is a schematic structural diagram of another embodiment of the apparatus provided in the present application.

Detailed Description

The relationship network data refers to data having correspondence in data generated on the internet, such as data representing a purchasing relationship between a user and a commodity, and data representing a friend relationship between a user and a user may constitute relationship network data.

Relational network data may be represented by attribute maps. For example, in the attribute diagram shown in fig. 1, two types of nodes are included, one is a user node: user i and user j, the other is a commodity node: item 1 and item 2, where user j is a friend of user i, and user j purchases item 1 and item 2. The nodes in the attribute graph and the nodes also have attribute information, for example, the attribute information of the user i includes user information such as the name, age, and sex of the user i, the attribute information of the product 1 includes product information such as the introduction, price, and category of the product 1, the attribute information of the friend between the user i and the user j includes friend attribute information such as a friend relationship value, and the attribute information of the purchase between the user j and the products 1 and 2 includes purchase attribute information such as purchase time.

In an online storage architecture: in the unicorn architecture, each node type corresponds to a separate physical cluster. For example, as shown in fig. 2, a data table 1 and a data table 2 of user node types are established in a user physical cluster, where the data table 1 is used to store friend lists of users i and j, the data table 2 is used to store commodity purchase lists of the users i and j, a data table 3 of commodity node types is established in a commodity physical cluster, and the data table 3 is used to store attribute information of the commodities 1 and 2.

If the commodities purchased by the friends of the user i need to be inquired online, the user cluster needs to be accessed first, the friends of the user i are inquired from the data table 1 to be the user j, the commodity list { i, j } purchased by the user j is inquired from the data table 2, then the commodity cluster is accessed, and the attribute information of the commodities i and j is inquired from the data table 3 to serve as a final online inquiry result. Obviously, the above-described online query process requires access to at least two physical clusters, a user cluster and a commodity cluster. And because various parameters such as configuration attributes and the like are different in different physical clusters, the access speed is slow, and further the online query speed is slow.

It can be seen that the unicorn architecture distinguishes node types when building a data table, and thus data tables of different node types need to be built on different physical clusters. Obviously, this method is not suitable for the scenario that the node types are more or are frequently added. For example, when there are more node types, more physical clusters are needed; when node types are frequently added, an independent physical cluster is needed every time a node type is added, which obviously causes higher cost.

The embodiment of the application provides a data table establishing method, an online query method and a related device, node types are not distinguished any more when the data table is established, data tables of different node types are established on the same physical cluster, and the data table is established more reasonably by architecture, so that a plurality of physical clusters are not required to be accessed during online query, and the online query speed is increased.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 3, an embodiment of a method for creating a data table is provided in the present application. The method of the embodiment comprises the following steps:

s301: and establishing a first data table on one physical cluster, wherein the first data table is used for storing the relationship information of the first node and the second node on line.

In an embodiment of the present application, the first node and the second node belong to different node types. For example, the first node may be a user node, and the second node may be a commodity node, so that the first data table is used to store relationship information between the user node and the commodity node, and indicates which commodities are purchased by each user respectively.

S302: and establishing a second data table on the physical cluster, wherein the second data table is used for storing the attribute information of the second node on line.

For example, the second node is a commodity node, and thus the second data table is used for storing attribute information of the commodity node.

As can be seen from the foregoing technical solutions, in the embodiment of the present invention, node types are not distinguished any more when a data table is established, so that even if node types of the first node and the second node are different, that is, the first data table and the second data table correspond to different node types, the first data table and the second data table are still stored in the same physical cluster. Therefore, a plurality of physical clusters do not need to be accessed during online query, and the online query speed is improved. For example, when a user node is the first node, and a commodity node is the second node, when a commodity purchased by a user i is queried online, only one physical cluster needs to be accessed, the commodity 1 corresponding to the user i can be queried from the first data table, and the attribute information of the commodity 1 is queried from the second data table, so that the online query speed is increased. In addition, the product 2 corresponding to the user i may be searched from the first data table, and the attribute information of the product 2 may be searched from the second data table.

In addition, when the node types are more, more physical clusters are not needed, and when the node types are newly added, one physical cluster does not need to be rearranged, but the data tables of the newly added node types can be established on the same physical cluster, so that the cost can be saved.

Wherein the first data table and the second data table may be represented by an index entry and an attribute entry. The following are described separately.

The first data table may include a first index entry for storing an identification of the first node online and a second index entry for storing an identification of the second node corresponding to the first node online. The first data table may further include a second attribute item for storing attribute information of a correspondence relationship of the first node and the second node online.

For example, as shown in table 1, the first data table may have a key-key-value (key-key-value) structure, where the first index item is master key information and stores an identifier of a user j, the second index item is slave key information and stores an identifier of an item purchased by the user j, and the second attribute item is value information and stores a time when the user j purchases the item.

TABLE 1

Main key	Slave key	value
			User j	Commodity 1	2016.5.3
User j	Commodity 2	2016.7.9

When online query is performed on the data in table 1, the slave key and/or the value may be queried according to the master key, or the slave key and the slave key may also be queried according to the master key and the slave key. For example, the inquiry user j purchases the list of items as item 1 and item 2.

The second data table may include a third index item for storing the identification of the second node online and a first attribute item for storing the attribute information of the second node online.

For example, as shown in table 2, the second data table may have a key-value structure, where the third index item is key information and stores an identifier of a product, and the first attribute item is value information and stores attribute information of the product, such as product information including an introduction, a price, and a category of the product 1.

TABLE 2

key	value
		Commodity 1	Attribute information of product 1
Commodity 2	Attribute information of product 2

When online query is performed on the data in table 2, value can be queried according to key. For example, attribute information of the product 1 is queried.

In the embodiment of the present application, in addition to the above data, the physical cluster may also store more data. The following are examples.

In addition to the correspondence between the first node and the second node, a correspondence between a third node and a fourth node may be stored on-line in the first data table, where the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type. That is, the first data table may be used to store the correspondence between the first type node and the second type node online. For example, as shown in table 3, the first data table further stores the identification of the product purchased by user i and the time when user i purchases the product.

TABLE 3

Main key	Slave key	value
			User j	Commodity 1	2016.5.3
User j	Commodity 2	2016.7.9
			User i	Commodity 3	2016.1.11

In addition to the attribute information of the second node, attribute information of other nodes, for example, attribute information of a fourth node, may be stored online in the second data table. That is, the second data table may be used to store the attribute information of the second type node online. For example, as shown in table 4, the second data table further stores attribute information of the product 3.

TABLE 4

key	value
		Commodity 1	Attribute information of product 1
Commodity 2	Attribute information of product 2
		Commodity 3	Attribute information of product 3

In addition to the first data table and the second data table, another data table, for example, a third data table, may be established on the physical cluster. And the third data table is used for storing the relationship information of a fifth node and the first node on line. The fifth node and the first node may belong to the same type of node, for example, as shown in table 5, the fifth node and the first node are both user nodes, and the third data table is used to store a buddy list of a user i: a user j and a friend relationship value of the user i and the user j; the fifth node may also be a different type of node than the first node.

TABLE 5

Main key	Slave key	value
			User i	User j	79

It should be noted that, in this embodiment of the present application, neither a structure of a data table established on the physical cluster nor a node type corresponding to each data table is limited.

In this embodiment of the present application, as shown in fig. 4, the physical cluster may be provided with N partitions (english: partition), where N is greater than or equal to 2, and when a data table of a node is established, a corresponding partition is determined according to data of an index entry of the data table, and the data table is established in the corresponding partition. This will be explained in detail below.

The method may further comprise: determining a first partition from the N storage partitions according to the first index entry, for example, determining partition 1 by using a hash algorithm according to a value of a master key in table 1, and S301 includes establishing a first data table on the first partition of the physical cluster.

The method may further comprise: determining a second partition from the N storage partitions according to the third index entry, for example, determining partition 2 by using a hash algorithm according to the key value in table 2, S302 includes: a second data table is established on a second partition of the physical cluster. In the embodiment of the present invention, the same data table may also be established on multiple partitions, for example, a partition is determined according to the value of the main key of each item of data in the data table, and the item of data of the data table is stored in the partition.

In the distributed physical cluster shown in fig. 4, each partition includes M backup areas (english: replica), and M is greater than or equal to 2. The data stored in each backup area in the same partition are kept consistent, so that the functions of simultaneous online storage and online query of a plurality of devices can be supported, and the data can be backed up. Specifically, the first partition and the second partition respectively include M backup areas; establishing a first data table on a first partition of the physical cluster, comprising: respectively establishing the first data tables on M backup areas of a first partition of the physical cluster; establishing a second data table on a second partition of the physical cluster, comprising: and respectively establishing the second data tables on the M backup areas of the second partition of the physical cluster.

Corresponding to the embodiment of the method for establishing the data table, the application also provides an embodiment of a method for carrying out online query on the data table. This will be explained in detail below.

Referring to fig. 5, a method embodiment of an online query method is provided.

In this embodiment, a first data table and a second data table are established on one physical cluster, and the first data table is used for storing relationship information between a first node and a second node on line; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types. It should be noted that this embodiment may be used to perform online query on the first data table and the second data table established by any of the above method embodiments.

The method of the embodiment comprises the following steps:

s501: and receiving an online query request, wherein the online query request is used for indicating online query of the relationship data of the first node.

For example, when a commodity purchased by a user needs to be queried online, the first node may be a user node, and the relationship data of the first node may be attribute information of the commodity node corresponding to the user node.

S502: and accessing the first data table and the second data table in the physical cluster on line, and inquiring the relation data of the first node.

For example, when the first node is a user node and the second node is a commodity node, and a commodity 1 corresponding to the user i is inquired from the first data table and attribute information of the commodity 1 is inquired from the second data table when the commodity purchased by the user i is inquired on line.

The first data table may include a first index entry for storing an identification of the first node online and a second index entry for storing an identification of the second node corresponding to the first node online. The first data table may further include a second attribute item for storing attribute information of a correspondence relationship of the first node and the second node online. For example, as shown in table 1, the first data table may have a key-key-value structure. When online query is performed on the data in table 1, the slave key and/or the value may be queried according to the master key, or the slave key and the slave key may also be queried according to the master key and the slave key. For example, in table 1, it is found that the list of items purchased by user j is item 1 and item 2.

The second data table may include a third index item for storing the identification of the second node online and a first attribute item for storing the attribute information of the second node online. The second data table may be represented as a key-value structure, for example, as shown in table 2. When online query is performed on the data in table 2, value can be queried according to key. For example, the attribute information of the product 1 is searched in table 2.

In this embodiment of the present application, the physical cluster may be provided with N partitions (english: partition), where N is greater than or equal to 2, as shown in fig. 4. Therefore, when the data table is inquired, the inquiry is carried out in the corresponding partition according to the index items of the data table. This will be explained in detail below.

The method may further comprise: determining a first partition from the N storage partitions according to the first index entry, for example, determining partition 1 by using a hash algorithm according to all values of the main key in Table 1; determining a second partition from the N storage partitions according to the second index entry, for example, determining partition 2 by using a hash algorithm according to all values of keys in table 2; online accessing the first data table and the second data table in the physical cluster in S502 includes: accessing online the first data table on the first partition and the second data table on the second partition in the physical cluster.

In the distributed physical cluster shown in fig. 4, each partition includes M backup areas, where M is greater than or equal to 2. The data stored in each backup area in the same partition are kept consistent, so that the functions of simultaneous online storage and online query of a plurality of devices can be supported, and the data can be backed up. Thus, accessing the first data table online may be accessing the first data table on one or more backup areas in the first partition, and similarly, accessing the second data table online may be accessing the second data table on one or more backup areas in the second partition.

In this embodiment of the application, in addition to the above data, the physical cluster may also store more data, and may perform online query on the data. The following are examples.

In addition to the correspondence between the first node and the second node, a correspondence between a third node and a fourth node may be stored on-line in the first data table, where the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type. For example, as shown in table 3, the first data table further stores the identification of the product purchased by user i and the time when user i purchases the product. In addition to the attribute information of the second node, attribute information of other nodes, for example, attribute information of a fourth node, may be stored online in the second data table. For example, as shown in table 4, the second data table further stores attribute information of the product 3.

The online query request is also used for indicating the relationship data of a third node to be queried online; the method may further comprise: accessing the first data table and the second data table in the physical cluster on line, and inquiring relation data of the third node; wherein the querying of the relationship data to the third node comprises: inquiring a fourth node corresponding to the third node from the first data table; and inquiring the attribute information of the fourth node from the second data table. For example, item 3 purchased by user i is searched for from table 3, and attribute information of item 3 is searched for from table 4.

In addition to the first data table and the second data table, another data table, for example, a third data table, may be established on the physical cluster. And the third data table is used for storing the relationship information of a fifth node and the first node on line. The third data table may be as shown in table 5.

The online query request is used for indicating the relationship data of the fifth node to be queried online: online accessing the first data table and the second data table in the physical cluster, and querying relationship data of the first node, including: accessing the first data table, the second data table and the third data table in the physical cluster on line to inquire the relation data of the fifth node; querying relationship data of the fifth node, comprising: and querying the first node corresponding to the fifth node from the third data table, and querying the relation data of the first node. For example, when a product purchased by a friend of user i needs to be queried, it is queried from table 5 that the friend of user i is user j, it is queried from table 1 that product 1 purchased by user j, and it is queried from table 2 that attribute information of product 1 is obtained.

In the embodiment of the application, three different query operators are defined in order to realize the online query function. The three query operators are used independently or in combination, so that the quick query function can be realized. The following are described separately.

The first operator: and the simple query operator (also called AtomicSearch operator) is used for querying the corresponding index item or attribute value from the data table according to the index item. For example, according to a given key: user j, can look up item 1 and item 2 corresponding to user j from table 1, again for example, according to a given key: the product 1 can be searched for the attribute information of the product 1 from table 2.

The second operator: and the composite operator is used for inquiring the corresponding index item or attribute value from the data table according to the index item and operating the inquired data.

The first data table is further used for storing relationship information between the third node and the fourth node online, for example, table 3 further stores an identification of a commodity purchased by user i: a commodity 3; the second data table is also used for storing the attribute information of the fourth node online, for example, table 4 also stores the attribute information of the commodity 3.

The online query request comprises a first composite operator; the method further comprises the following steps:

querying the relationship data of the first node, comprising: inquiring relation data of the first node based on the first instruction, for example, inquiring the commodity purchased by the user i based on the first instruction: commodity 1 and commodity 2.

Querying the relationship data of the third node, comprising: and querying the relationship data of the third node based on the second instruction, for example, querying a commodity purchased by the user j based on the second instruction: and (3) a commercial product.

The method further comprises the following steps: and comprehensively processing the relationship data of the first node and the relationship data of the third node based on the third instruction. Wherein the integrated process may include any one of the following processes: merging, intersection, and difference operations. For example, the goods purchased by user i and the goods purchased by user j may be merged, and the final result is: item 1, item 2, and item 3. For another example, the product purchased by user i and the product purchased by user j may be intersected, and the final result may be null. For another example, the product purchased by user i and the product purchased by user j are subjected to difference set operation, that is, the product purchased by user i but not purchased by user j is determined, and the final results are product 1 and product 2.

The third operator: and the composite transfer operator is used for taking the result inquired from one data table as an index item inquired from the other data table.

And a third data table is also established on the physical cluster and used for storing the relationship information between a fifth node and the first node on line. For example, as shown in table 5, the fifth node and the first node are both user nodes, and the third data table is used to store a buddy list of user i: a user j and a friend relationship value of the user i and the user j;

the online query request comprises a second composite operator; the method further comprises the following steps:

and analyzing a fourth instruction and a fifth instruction from the first composite operator, wherein the fourth instruction is used for indicating the node corresponding to the fifth node to be queried online, and the fifth instruction is used for indicating the relationship data of the node corresponding to the fifth node to be queried online.

Inquiring the first node corresponding to the fifth node from the third data table, including: based on the fourth instruction, the first node corresponding to the fifth node is inquired from the third data table; querying the relationship data of the first node, comprising: and querying the relation data of the first node based on the fifth instruction. For example, based on the fourth instruction, it is found that the friend of the user i is the user j from table 5, the user j is used as the index item queried in table 1, the item 1 purchased by the user j is queried from table 1, and the attribute information of the item 1 is queried from table 2.

In the embodiment of the present application, the storage architecture may be composed of two parts, namely, an offline cluster and an online cluster. The offline Cluster can be divided into an HDFS (Hadoop distributed file system) Index built Cluster and a Real-Time Stream processing Cluster, and the HDFS Index built Cluster is mainly used for converting the attribute maps into key-value and key-key-value structured data tables in an efficient batch mode and synchronizing the data tables to the online Cluster. The Real-Time Stream Process Cluster is mainly used to Process Real-Time update messages and send them to the online Cluster, which can Process the messages with a second delay.

And the online cluster may be a physical cluster in any of the embodiments described above, including a Proxy sub-cluster and a Search sub-cluster.

The proxy sub-cluster is mainly responsible for receiving an online query request input by a user, executing query and returning a final query result to the user. During the query execution, the Proxy sub-cluster sends a request to the Search sub-cluster to obtain the key-value and the data in the data table of the key-key-value structure. Both the compound operator and the compound transfer operator described above are performed in Proxy sub-clusters. The Search sub-Cluster is mainly responsible for loading a data table with a key-value structure and a key-key-value structure and updating the content in the data table according to an update message sent by the Real-Time Stream Process Cluster. In addition, the Search sub-cluster has an important function of receiving an online query request sent by the Proxy sub-cluster, performing data query operation according to the online query request, and returning a query result to the Proxy sub-cluster.

The proxy sub-population includes at least three layers:

and the service access layer is mainly used for receiving the online query request, converting the online query request into a format which can be identified by the execution core layer and then sending the converted online query request to the execution core layer. And converting the result returned by the core layer into a return format expected by the user; executing a core layer, wherein the core layer is used for realizing online query, and specifically comprises the steps of requesting to analyze and check an online query request, generating and sending a query plan to a data acquisition layer, and finally returning a query result to a service access layer; a data acquisition layer, which is primarily responsible for communication with the Search sub-cluster. The layer forwards the online query request sent by the execution core layer to the Search sub-cluster, and returns the result returned by the Search sub-cluster to the execution core layer.

The Search sub-population includes at least two layers:

and the storage management layer is mainly responsible for loading and updating the data tables of the key-value structure and the key-key-value structure.

And the query layer is mainly responsible for receiving an online query request issued by the Proxy sub-cluster, querying according to the online query request and returning a query result to the Proxy sub-cluster. The layer can also perform operations such as filtering and sorting on the query results.

Corresponding to the above method embodiments, the present application further provides corresponding apparatus embodiments, which are specifically described below.

Referring to fig. 6, the present application provides an embodiment of a device for creating a data table. The apparatus of this embodiment includes: a first establishing unit 601 and a second establishing unit 602.

A first establishing unit 601, configured to establish a first data table on a physical cluster, where the first data table is used to store relationship information between a first node and a second node online.

A second establishing unit 602, configured to establish a second data table on the physical cluster, where the second data table is used to store attribute information of a second node online.

Wherein the first node and the second node are of different node types.

Referring to fig. 7, an embodiment of an apparatus for online query is provided. In this embodiment, a first data table and a second data table are established on one physical cluster, and the first data table is used for storing relationship information between a first node and a second node on line; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types;

the apparatus of this embodiment includes: a receiving unit 701 and a querying unit 702.

A receiving unit 701, configured to receive an online query request, where the online query request is used to indicate that relationship data of a first node is queried online;

a query unit 702, configured to access the first data table and the second data table in the physical cluster online, and query the relationship data of the first node;

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for creating a data table, comprising:

wherein the first node and the second node are of different node types,

the first data table comprises a first index entry and a second index entry, the first index entry is used for storing the identification of the first node online, and the second index entry is used for storing the identification of the second node corresponding to the first node online;

the second data table includes a third index item for storing an identification of the second node online and a first attribute item for storing attribute information of the second node online, and

the physical cluster comprises N storage partitions, wherein N is greater than or equal to 2; the establishing method further comprises the following steps:

2. The establishing method according to claim 1, wherein the first data table further comprises a second attribute item, and the second attribute item is used for storing attribute information of the corresponding relationship between the first node and the second node online.

3. The creating method according to claim 2, wherein the first data table has a key-value structure, wherein the first index item is primary key information, the second index item is secondary key information, and the second attribute item is value information;

the second data table is of a key-value structure, wherein the third index item is key information, and the first attribute item is value information.

4. The method according to claim 1, wherein the first partition and the second partition respectively comprise M backup areas, M being greater than or equal to 2;

5. The establishing method according to claim 1, wherein the first data table is further used for storing relationship information between a third node and a fourth node online; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type;

6. An online query method is characterized in that a first data table and a second data table are established on a physical cluster, and the first data table is used for storing the relationship information of a first node and a second node online; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types; the method comprises the following steps:

wherein the querying of the relationship data to the first node comprises: inquiring a second node corresponding to the first node from the first data table, inquiring attribute information of the second node from the second data table,

the physical cluster comprises N storage partitions, wherein N is greater than or equal to 2; the method further comprises the following steps:

7. The online query method of claim 6, wherein the first data table is further used for storing relationship information between a third node and a fourth node online; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type; the online query request is also used for indicating the relationship data of a third node to be queried online;

the method further comprises the following steps:

8. The online query method of claim 7, wherein the online query request includes a first composite operator; the method further comprises the following steps:

9. The online query method according to claim 8, wherein the comprehensive processing comprises any one of the following processes: merging, intersection, and difference operations.

10. The online query method of claim 6, wherein a third data table is further established on the physical cluster, and the third data table is used for storing relationship information between a fifth node and the first node online; the online query request is used for indicating the relationship data of the fifth node to be queried online:

11. The online query method of claim 10, wherein the online query request includes a second composite operator; the method further comprises the following steps:

analyzing a fourth instruction and a fifth instruction from the second composite operator, wherein the fourth instruction is used for indicating the node corresponding to the fifth node to be queried online, and the fifth instruction is used for indicating the relationship data of the node corresponding to the fifth node to be queried online;

12. An apparatus for creating a data table, comprising:

wherein the first node and the second node are of different node types,

the physical cluster comprises N storage partitions, wherein N is greater than or equal to 2; the establishing device further comprises:

when the first data table is established on one physical cluster, the first establishing unit is used for establishing the first data table on a first partition of the physical cluster;

when the second data table is established on the physical cluster, the second establishing unit is configured to establish the second data table on a second partition of the physical cluster.

13. The apparatus according to claim 12, wherein the first data table further includes a second attribute item, and the second attribute item is configured to store attribute information of a correspondence relationship between the first node and the second node online.

14. The creating apparatus according to claim 13, wherein the first data table has a key-value structure, wherein the first index entry is primary key information, the second index entry is secondary key information, and the second attribute entry is value information;

15. The apparatus according to claim 12, wherein the first partition and the second partition respectively comprise M backup areas, M being greater than or equal to 2;

when a first data table is established on a first partition of the physical cluster, the first establishing unit is configured to respectively establish the first data table on M backup areas of the first partition of the physical cluster;

when a second data table is established on the second partition of the physical cluster, the second establishing unit is configured to respectively establish the second data table on M backup areas of the second partition of the physical cluster.

16. The establishing apparatus according to claim 12, wherein the first data table is further used for storing relationship information between a third node and a fourth node online; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type;

17. An online inquiry device is characterized in that a first data table and a second data table are established on a physical cluster, and the first data table is used for storing the relationship information of a first node and a second node online; the second data table is used for storing the attribute information of the second node on line; wherein the first node and the second node are of different node types; the device comprises:

the physical cluster comprises N storage partitions, wherein N is greater than or equal to 2; the device further comprises:

the query unit, when accessing the first data table and the second data table in the physical cluster online, is configured to: accessing online the first data table on the first partition and the second data table on the second partition in the physical cluster.

18. The apparatus of claim 17, wherein the first data table is further configured to store relationship information between a third node and a fourth node online; the second data table is also used for storing attribute information of a fourth node online; wherein the third node and the first node belong to the same node type, and the fourth node and the second node belong to the same node type; the online query request is also used for indicating the relationship data of a third node to be queried online;

19. The apparatus of claim 18, wherein the online query request includes a first composite operator; the device further comprises:

when the relationship data of the first node is queried, the query unit is used for querying the relationship data of the first node based on the first instruction;

when the relationship data of the third node is queried, the query unit is configured to query the relationship data of the third node based on the second instruction;

20. The apparatus of claim 19, wherein the integrated process comprises any one of: merging, intersection, and difference operations.

21. The apparatus according to claim 17, wherein a third data table is further established on the physical cluster, and the third data table is used for storing relationship information between a fifth node and the first node online; the online query request is used for indicating the relationship data of the fifth node to be queried online:

the query unit is configured to access the first data table and the second data table in the physical cluster on line, and when the relationship data of the first node is queried, the query unit is configured to access the first data table in the physical cluster on line;

when the relationship data of the fifth node is queried, the query unit is configured to query the first node corresponding to the fifth node from the third data table, and query the relationship data of the first node.

22. The apparatus of claim 21, wherein a second composite operator is included in the online query request; the device further comprises:

the analysis unit is used for analyzing a fourth instruction and a fifth instruction from the second composite operator, wherein the fourth instruction is used for indicating the node corresponding to the fifth node to be queried online, and the fifth instruction is used for indicating the relationship data of the node corresponding to the fifth node to be queried online;

when the first node corresponding to the fifth node is queried from the third data table, the querying unit is configured to query the first node corresponding to the fifth node from the third data table based on the fourth instruction;

when the relationship data of the first node is queried, the query unit is configured to query the relationship data of the first node based on the fifth instruction.