CN114143284A

CN114143284A - Data identifier generation method and device, electronic equipment and storage medium

Info

Publication number: CN114143284A
Application number: CN202111416615.9A
Authority: CN
Inventors: 何春林; 毛军; 丰灵均; 陶立宏
Original assignee: Qax Technology Group Inc; Secworld Information Technology Beijing Co Ltd
Current assignee: Qax Technology Group Inc; Secworld Information Technology Beijing Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-03-04
Anticipated expiration: 2041-11-25
Also published as: CN114143284B

Abstract

The application provides a data identifier generation method, a device, an electronic device and a storage medium, wherein the data identifier generation method is applied to each node under a cascading scene and comprises the following steps: determining node identification of a node where target data is located, wherein the node identification of each node is generated based on a random number generation algorithm; performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data; and splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data. In the case of a small number of pieces of data, the data can be uniquely identified without a 128-bit UUID and with a smaller number of bits. Under the condition of carrying out unique identification on each data, the waste of storage space caused by identification can be avoided.

Description

Data identifier generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data identifier generation method and apparatus, an electronic device, and a storage medium.

Background

A Universal Unique Identifier (UUID) exists to enable each piece of data in the cascading scenario to have unique identification information.

Currently, UUIDs are generated mainly by 5 algorithms. One of the most common is the Snowflake algorithm (Snowflake). The generation algorithm of 5 UUIDs including the snowflake algorithm enables the UUIDs to exist in the number of 128 bits in the computer system.

However, in the case of a small number of pieces of data, if a 128-bit UUID is still used, many bits in the UUID are not fully utilized, which results in waste of storage space.

Disclosure of Invention

An embodiment of the application aims to provide a data identifier generation method, a data identifier generation device, electronic equipment and a storage medium, so as to save storage space of a UUID.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

a first aspect of the present application provides a data identifier generating method, where the method is applied to each node in a cascading scenario, and the method includes: determining node identification of a node where target data is located, wherein the node identification of each node is generated based on a random number generation algorithm; performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data; and splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data.

A second aspect of the present application provides a data identifier generating apparatus, where the apparatus is applied to each node in a cascading scenario, and the apparatus includes: the node identification module is used for determining the node identification of the node where the target data is located, and the node identification of each node is generated based on a random number generation algorithm; the local identification module is used for performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identification of the target data; and the splicing module is used for splicing the node identifier of the node where the target data is located with the local identifier of the target data to obtain the data identifier of the target data.

A third aspect of the present application provides an electronic device comprising: a processor, a memory, a bus; the processor and the memory complete mutual communication through the bus; the processor is for invoking program instructions in the memory for performing the method of the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium comprising: a stored program; wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of the first aspect.

Compared with the prior art, according to the data identifier generation method provided by the first aspect of the present application, when a data identifier needs to be generated for data in each node in a cascading scene, first, a node identifier of a node where target data is located is determined, and the node identifier of each node is generated based on a random number generation algorithm; then, carrying out numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data; and finally, splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data. Thus, when the number of pieces of data is small, the data can be uniquely identified without using a 128-bit UUID and with a smaller number of bits. Under the condition of carrying out unique identification on each data, the waste of storage space caused by identification can be avoided.

The data identifier generating device provided by the second aspect, the electronic device provided by the third aspect, and the computer-readable storage medium provided by the fourth aspect of the present application have the same or similar beneficial effects as the data identifier generating method provided by the first aspect.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a schematic flow chart of a data identifier generation method in an embodiment of the present application;

fig. 2 is a schematic flow chart of generating a node identifier in the embodiment of the present application;

fig. 3 is a schematic flow chart of generating a local identifier in an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating the generation of symbol identifiers in the embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating the process of determining the total number of bits of the data identifier in the embodiment of the present application;

FIG. 6 is a schematic structural diagram of a data identifier in an embodiment of the present application;

fig. 7 is a first schematic structural diagram of a data identifier generating apparatus in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data identifier generating apparatus in the embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

At present, when a UUID needs to be generated for each piece of data in each node in a cascading scene, 5 common UUID generation algorithms including a snowflake algorithm are mainly used. However, these algorithms generate a UUID of data with 128 bits as the number of bits of the generated UUID. However, in the case of less data, many bits of the 128-bit UUID are obviously not fully utilized, which may cause the UUID to occupy too much storage space, and thus waste storage space.

The inventor has found through intensive research that the 5 commonly used UUID generation algorithms can solve the problem of unique data identifiers to some extent, but only can generate a 128-bit UUID in view of the limitations of the algorithms themselves. However, if the increasing sequence is used, although the number of bits occupied by the UUID can be reduced, if the next-stage data is uploaded to the previous stage, there is a problem that the UUID of the next-stage data overlaps with the UUID of the previous-stage data.

In view of this, for a cascading scenario with less data, the inventor gives up the existing commonly used UUID generation algorithm, and uses a random number generation algorithm to generate the identifier of the node where the data is located. And generating the identification of the data in the node by combining the self-increment sequence. And finally, combining the identifier of the node where the data is located with the identifier of the data in the node to obtain the final identifier of the data. Here, the node identification and the node internal identification may be selected autonomously according to the actual amount of data. Thus, in the case of a small number of pieces of data, a 128-bit UUID is not required, for example: data can also be uniquely identified using a UUID of 64 bits, 32 bits, etc. Therefore, the method and the device avoid excessive occupation of the UUID on the storage space, can save the storage space, can not cause identification conflict between the lower-level data and the upper-level data after the lower-level data is uploaded to the upper level, and can ensure the uniqueness of each data identification in a cascading scene.

In practical application, the data identifier generation method according to the embodiment of the present application may be applied to each node in a cascading scenario. That is to say, by using the data identification method provided in the embodiment of the present application, a UUID corresponding to each data in each node can be generated in a cascading scenario, and the UUIDs of each data in each node are not repeated. For example: in the distributed storage system, for each piece of data stored in each node, a data identifier formed by splicing the identifier of the node where the data is located and the identifier of the data in the node can be generated according to the data identifier generation method provided by the embodiment of the application, so that two pieces of data in the same node can be distinguished due to different identifiers in the node. For two pieces of data at different nodes, even if the respective internal identifiers of the two pieces of data are the same, the identifiers of the two pieces of data at the nodes are different, and therefore the two pieces of data can be distinguished. And the mark of the node and the mark in the node can be set by the sign digit according to the amount of data. In the case of a small amount of data, the 128-bit UUID is not used. Under the condition of carrying out unique identification on each data, the waste of storage space caused by identification can be avoided.

Next, a data identification generation method provided in the embodiment of the present application will be described in detail.

Fig. 1 is a schematic flow chart of a data identifier generation method in an embodiment of the present application, and referring to fig. 1, the method may include:

s101: and determining the node identification of the node where the target data is located.

The node identification of each node is generated based on a random number generation algorithm.

That is, when there is a need for data identification generation, first, a cluster is received. In the cluster, there are a plurality of nodes. The nodes may be in peer-level relationship or in superior-inferior relationship. The specific relationship between the nodes in the cluster is not limited here. In each node, one or more pieces of data may exist, and data may not exist. The number of data in each node in the cluster is not limited here. A random number generation algorithm may then be employed to generate its respective node identification for each node in the cluster. The specific type of random number generation algorithm is not limited herein.

It should be noted here that, when generating a node identifier for each node in the cluster, the corresponding node identifier may be generated only for the node in which the data exists. That is, before generating an identifier for a node in the cluster, it is determined whether data exists in the node, if so, a corresponding node identifier is generated for the node, and if not, a corresponding node identifier is not generated for the node. This is because: if no data exists in the node, the requirement of generating data identification does not exist. At this time, if a corresponding node identifier is generated for the node, it is meaningless, which not only wastes the generation efficiency of the data identifier, but also occupies one node identifier. Therefore, the node identification is generated only for the nodes with data in the cluster, the generation efficiency of the data identification can be improved, and the node identification can be saved for the data generation data identification in more nodes.

After the node identification of each node in the cluster is generated based on the random number generation algorithm, aiming at a certain target data in the cluster, after the node where the target data is located is determined, the node identification of the node is the node identification of the target data.

In a specific implementation process, after node identifiers are generated for each node in the cluster, each node and the corresponding node identifier thereof may be stored in one table. Therefore, when a data identifier needs to be generated for a certain target data in the cluster, after the node where the target data is located is determined, the node identifier of the node can be found from the table and then used as the node identifier of the target data. And subsequently, continuing to execute the steps of determining the local identification of the target data and the like.

S102: and performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data.

After the node identifier of the node where the target data is located is determined, the local identifier of the target data in the node needs to be determined, so that the data identifier of the target data can be obtained after the node identifier is combined with the local identifier.

In the process of generating the local identifier of the target data, the maximum data sequence number in the node where the target data is located is subjected to numerical value accumulation, so that the local identifier of the target data in the node is obtained.

The maximum data sequence number here refers to the maximum sequence number edited for the existing data in the node where the target data is located. For example, assume that the target data is located in node a, where data a and data b already exist. The number corresponding to data a is 001, and the number corresponding to data b is 002. Then, the maximum data sequence number at this time is 002. If no data exists in the node a, the maximum data sequence number at this time is 000. The above 000, 001, 002 are only examples of data serial numbers, and data may exist in other forms. The specific existence form of the data sequence number is not limited herein.

In the process of generating the local identifier of the target data, when the node where the target data is located has data, the maximum data sequence number existing in the node is continuously subjected to numerical accumulation, and after one-time accumulation, the local identifier of the target data is obtained. And when no data exists in the node where the target data is located, continuing to perform numerical accumulation from 0, and obtaining the local identifier of the target data after once accumulation. The starting point for performing the numerical accumulation at this time needs to be determined according to the specific situation of the existing data in the node where the target data is located.

In the process of performing numerical accumulation, the numerical accumulation may be performed in a +1 manner, or may be performed in a +2 manner. That is, the numerical accumulation may be performed by a preset step size. When the data in the node is more and/or the number of bits of the local identifier is limited, the preset step length can be set to be smaller. And when the data in the node is less and/or the number of bits of the local identifier is more, the preset step length can be set to be larger. That is, the specific value of the preset step length can be determined according to actual conditions. The specific value of the preset step is not specifically limited herein.

S103: and splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data.

After the node identifier and the local identifier of the target data are obtained, the node identifier of the target data and the local identifier of the target data can be spliced. In this way, the data identification of the target data is obtained.

In the process of splicing, specifically, the following two ways can be included but not limited.

The first mode is as follows: and splicing the node identification and the local identification end to end.

Here, two cases can be divided.

Case 1: the tail of the node identity is concatenated with the head of the local identity.

For example, assume that the node identification of the target data is 123 and the local identification of the target data is 456. Then the data identification of the spliced target data is 123456.

Case 2: the tail of the local identity is concatenated with the head of the node identity.

For example, assume that the node identification of the target data is 123 and the local identification of the target data is 456. Then the data id of the spliced target data is 456123.

The second mode is as follows: and splicing the node identification and the local identification in a mutual insertion mode.

For example, assume that the node identification of the target data is 123 and the local identification of the target data is 456. Then the data identification of the spliced target data may be 124356, 142536, etc.

No matter which mode is adopted for splicing, the spliced data identification can uniquely identify the target data. Of course, other ways may also be used to splice the node identifier and the local identifier. The specific way of splicing is not limited here.

As can be seen from the above, in the data identifier generation method provided in the embodiment of the present application, when a data identifier needs to be generated for data in each node in a cascading scenario, first, a node identifier of a node where target data is located is determined, and the node identifier of each node is generated based on a random number generation algorithm; then, carrying out numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data; and finally, splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data. Thus, when the number of pieces of data is small, the data can be uniquely identified without using a 128-bit UUID and with a smaller number of bits. Under the condition of carrying out unique identification on each data, the waste of storage space caused by identification can be avoided.

Further, as a refinement and an extension of the method shown in fig. 1, in order to generate node identifiers of nodes more randomly, a meisen rotation algorithm may be specifically adopted as the random number generation algorithm. Fig. 2 is a schematic flowchart of a process of generating a node identifier in the embodiment of the present application, and as shown in fig. 2, before step S101, the method may further include:

s201: and calculating parameters in the Messen rotation algorithm by adopting a linear congruence algorithm.

The linear congruence algorithm (linear Congruential Method), capable of generating uniformly distributed random numbers, was proposed by lemier in 1951 in the united states.

The linear congruence algorithm is specifically as follows:

x_n+1＝(ax_n+ c) mod (m) formula (1)

Wherein m represents modules (modules), and m is more than 0. a represents a multiplier (Multipliner), 0 < a < m. c represents an Increment (Increment), 0. ltoreq. c < m. n is not less than 0 and x₀Is the initial value of the recursion sequence, also called Seed. x is the number of_n+1It is a generated random number. What is formed by a plurality of different x is a random number sequence, i.e. a parameter that needs to be substituted into the subsequent meisen rotation algorithm.

When a node needs to generate a corresponding node identifier for a node, a random number needs to be obtained through a linear congruence algorithm, and then the random number is substituted into a Messen rotation algorithm, so that the node identifier of the node is obtained.

In the process of adding node identification to the first node, the seed is firstly obtained through a linear congruence algorithmAnd substituting the seed into the Messen rotation algorithm to calculate the node identification of the first node. Then x is obtained by linear congruence algorithm₁And then x is₁And substituting the node identification of the next node into the Meisen rotation algorithm to calculate the node identification of the next node. By analogy, x is obtained by a linear congruence algorithm₂、x₃Etc. and then x₂、x₃And substituting the node identifiers into the Meisen rotation algorithm to calculate the node identifiers of the subsequent nodes until the node identifiers of all the nodes are calculated.

In determining the seed, in order to make the random number obtained by the linear congruence algorithm better distinguish from the nodes in other clusters, the seed may be determined based on the timestamp of the current node and its location information.

Specifically, step S201 may include:

step A1: and determining a seed in the linear congruence algorithm according to the timestamp and the position information of the current node.

The timestamp may refer to a time when the node is established, a time when the first data is stored in the node, or a time when the last data is stored in the node. The specific content of the time stamp is not limited herein.

The location information may refer to an Identity (ID) of a platform where the node is located, or may refer to specific geographic coordinates of the node. The specific content of the current position is not limited herein.

The Seed (Seed) can be specifically determined by the following formula:

(timestamp platform ID) +0xBL) & ((1L < <48) -1) formula (2)

Currently, the seed in the linear congruence algorithm can also be determined by any one of the timestamp or the platform ID, or multiplying the timestamp by the platform ID, or using other parameters (e.g., total number of all nodes in the platform, digest of the nodes, etc.). The specific determination method of the seed is not limited herein.

Step A2: and calculating parameters in the Messen rotation algorithm based on the linear congruence algorithm substituted into the seeds.

After the seeds are determined, the seeds can be substituted into the linear congruence algorithm as the initial value of x in the linear congruence algorithm. And then running a linear congruence algorithm after the seeds are substituted, and obtaining a result which can be used as a parameter in the Messen rotation algorithm.

S202: and respectively generating node identifications of the nodes based on the Meisen rotation algorithm after the parameters are substituted.

After the parameters are obtained through the linear congruence algorithm, the parameters can be substituted into the Meisen rotation algorithm to be used as the parameters in the Meisen rotation algorithm, and then the Meisen rotation algorithm substituted with the parameters is operated, so that the node identification of one node can be obtained.

The Merson spin algorithm (Mersenne Twister), which is a pseudo-random number generation algorithm. Developed by Songyun and Western village Tuoshi in 1997. Based on the matrix linear recursion over finite binary fields, { \\ displaytype F _ {2} } F _ { {2} }, high-quality pseudo-random numbers can be generated quickly.

Mersenne Twister, is derived from the fact that the period length is taken from the meisen prime number. This algorithm typically uses two similar variants, except that different meisenna prime numbers are used. An updated and more common is MT19937, 32 bit word length. Yet another variation is the 64-position version of MT 19937-64. For a length of k bits, Mersene Twister generates discrete uniformly distributed random numbers between intervals of { \ displaytyle [0,2^ k } -1] } [0,2^ k } -1 ]. The specific number of bit word length versions to be used may be determined according to the actual bit number requirement for the node identifier, and is not specifically limited here.

Essentially, the whole algorithm is mainly divided into three stages: stage 1: obtaining a basic Messen rotating chain; stage 2: performing a rotation algorithm on the rotating chain; stage 3: and processing the result obtained by the rotation algorithm.

According to the content, the parameters in the Messen rotation algorithm are determined through the linear congruence algorithm, and then the parameters are substituted into the Messen rotation algorithm to obtain the node identification of the node, so that the node identification can be sufficiently random, can be better distinguished from other node identifications, and further better ensure the uniqueness of the data identification. And determining seeds in the linear congruence algorithm through the time stamps and the position information of the nodes, so that the nodes in the current cluster can be better distinguished from the nodes in other clusters, and the uniqueness of data identification is better ensured.

Further, as a refinement and an extension of the method shown in fig. 1, when local identifiers need to be added to a plurality of data in a certain node, in order to reduce the read-write frequency for the data table and further improve the processing efficiency of generating the data identifiers, after the local identifiers are allocated to the data, each data and the corresponding identifier thereof may be updated in the data table at one time. Fig. 3 is a schematic flowchart of a process of generating a local identifier in this embodiment, and as shown in fig. 3, step S102 may specifically include:

s301: and when the number of the target data is multiple, acquiring the local identifications of the corresponding number after the maximum data serial number accumulation in the node where the target data is located from the data table, and respectively taking the local identifications as the local identifications of the target data.

The number of the target data is multiple, that is, in the same node, a local identifier needs to be added to the multiple data. The data table stores the corresponding relationship between each node data and the corresponding local identifier, and also stores the local identifier which is not distributed in each node. In order to save the read-write times of the data table, for a plurality of target data, after determining the node identifier of the node where the target data is located, the local identifiers of the corresponding number in the node may be directly pulled from the data table, and then the local identifiers are respectively used as the local identifiers of the plurality of target data.

For example, assume that the target data is 3, i.e., data a, data b, and data c, which belong to node a. In the data table, the data id (e.g., 123001) of the data d in the node a is already stored. Wherein 123 is the node identifier of the node a, and 001 is the local identifier of the data d in the node a. Since the target data is 3, three sequence numbers, namely 002, 003 and 004, need to be pulled from the data table by data accumulation after the maximum data sequence number, and are used as local identifiers of data a, data b and data c, respectively.

The above 001, 002, etc. are only examples of local identification. In practical applications, the number of bits of the local identifier may not be limited to 3 bits, but may be more than 36 bits. When the number of bits of the actually generated local identifier does not reach the number of bits of the actually required local identifier, it may be supplemented by 0. Likewise, the same is true for the node identifications described above. The specific number of bits of the local identifier and the node identifier is not limited herein.

In practical applications, the above data table may only store all used and unused local identifiers in a certain node, and the local identifiers in different nodes are stored in different data tables. Of course, the above-mentioned data table may also store node identifiers of all nodes and all local identifiers therein. The specific storage content and form in the data table are not limited herein.

S302: the target data and its local identification are updated in the data table.

After a plurality of target data and local identifications thereof are determined, the target data and the local identifications thereof can be updated in the data table at one time. Therefore, the data updating frequency in the data table is changed from one data to a plurality of data, so that the updating frequency of the data table is reduced, and the processing efficiency of generating the data identification is improved.

Continuing with the above example, after determining the local identities 002, 003, 004 of data a, data b, and data c, since data a, data b, and data c are all data in node a, the node identities of data a, data b, and data c are all 123. Thus, data identifications of data a, data b, and data c are determined to be 123002, 123003, 123004, respectively. Data a, data b and data c and their corresponding data identifications 123002, 123003, 123004 are then directly updated once in the data table.

As can be seen from the above, when local identifiers need to be added to multiple data in a certain node, the local identifiers of a corresponding number are directly pulled out from the data table at one time according to the number of the multiple data, and the data identifiers of the multiple data are determined by combining the node identifiers of the multiple data, so that the multiple data and the data identifiers thereof are updated in the data table at one time. Therefore, the data updating frequency in the data table is changed from one time of one piece of data to one time of a plurality of pieces of data, the reading and writing frequency of the data table can be reduced, and the processing efficiency of generating the data identification is improved.

Further, as refinement and extension of the method shown in fig. 1, in order to improve the identification degree of the data identifier, that is, the attribute of the corresponding data can be directly obtained through the data identifier, a bit may be added to the data identifier, and a symbol identifier is added to the bit. Fig. 4 is a schematic flowchart of generating a symbol identifier in the embodiment of the present application, and as shown in fig. 4, before step S103, the method may further include:

s401: attributes of the target data are determined.

Here, the attributes of the target data may include at least: a type of the target data, a meaning of the target data, one or more of a user to which the target data belongs.

As for the type of the target data, it may be referred to which of the types of order data, user data, browsing data, and the like the target data belongs. Of course, the target data may be of other types not mentioned here, which need to be determined according to the specific content of the target data. The specific type is not limited herein.

The meaning of the target data may refer to a summary of the specific contents of the target data. For example: the target data is calculated by a Message-Digest Algorithm (MD 5), and the result is regarded as the meaning of the target data and added to the sign bit. Of course, the specific content of the target data may also be summarized in other ways. The specific manner of acquiring the meaning of the target data is not limited herein.

The user to which the target data belongs may specifically refer to an ID, a name, a network address, and the like of the user. The specific content of the user is not limited herein.

Of course, the attributes of the target data are not limited to the types, meanings, and users, but may be other specific contents capable of indicating the attributes of the target data. The specific content of the target data attribute is not limited herein.

S402: a symbolic identification of the target data is generated based on the attributes of the target data.

After the attribute of the target data is determined, the attribute of the target data can be further processed, the attribute of the target data is processed into a numerical value, and the numerical value is used as a symbol identifier of the target data and is added to a data identifier of the target data. Therefore, when seeing the data identification of the target data, the user can know all the specific contents of the target data without specifically looking up the target data, and the expression of the data identification can be richer.

Specifically, the mapping relationship between each attribute and a preset value may be established in advance. After the target data is determined to be a certain attribute, a preset value corresponding to the attribute can be searched in the mapping relation, and the searched preset value is used as a symbol identifier of the target data. Of course, the symbolic identification of the target data may also be calculated based on the attributes of the target data in other ways. The specific calculation method is not limited herein.

When it is determined that there are various attributes of the target data, for example: and determining the type of the target data as order data, and determining the user to which the target data belongs as a user A. Then the number of bits of the symbol identification needs at least 2 bits. That is, the number of bits of the symbol mark needs to be determined based on the number of kinds of attributes of the target data actually determined. That is, the number of types of attributes of the target data is large, and the number of bits of the symbol must be at least equal to the number of types of attributes of the target data.

Of course, this does not mean that each attribute can occupy only one bit in the symbol identification. When there are many specific categories under a certain attribute, a symbol identifier with one bit may not be enough to distinguish among such many categories, and then a symbol identifier with one or more bits needs to be added to distinguish among such many categories.

For example, assume that there are 12 specific types of data, respectively: type 1, type 2, … …, type 12. Then, the symbol mark at this time needs at least 2 bits to satisfy the data differentiation. For example: the target data is of type 2, then the symbolic identification of the target data may be 02.

After the symbolic mark of the target data is generated, correspondingly, the symbolic mark should exist in the spliced data mark. That is, step S103 becomes: and splicing the symbol identifier of the target data, the node identifier of the node where the target data is located and the local identifier of the target data to obtain the data identifier of the target data.

In a specific implementation process, the symbol identifier of the target data, the node identifier of the node where the target data is located, and the local identifier of the target data may be spliced in sequence. Thus, a data identifier of "symbol identifier + node identifier + local identifier" is obtained. Of course, the sequence of splicing the symbol identifier of the target data, the node identifier of the node where the target data is located, and the local identifier of the target data may also be adjusted. For example: data identifiers such as node identifier + local identifier + symbol identifier, local identifier + node identifier + symbol identifier, and the like are formed. The specific splicing form of the final data identifier is not specifically limited here, as long as the final data identifier includes a number identifier, a node identifier, and a local identifier.

According to the content, the sign bit is added to the data identifier, the sign identifier of the target data is determined based on the attribute of the target data, and the sign identifier of the target data is added to the sign bit, so that a user can know the attribute of the corresponding data only through the data identifier, and the identification degree of the data identifier is improved.

Further, as a refinement and expansion of the method shown in fig. 1, in order to more fully utilize each bit in the data identifier and reduce the number of bits of the data identifier as much as possible under the condition that the total amount of data is not large, and maximally save the storage space, the total number of bits of the data identifier may be set according to a preset total storage amount. Fig. 5 is a schematic flowchart of determining the total number of data identifiers in the embodiment of the present application, and as shown in fig. 5, before step S101, the method may further include:

s501: and setting the total digit of the data identification according to the preset total storage amount.

Wherein the total number of bits of the set data identification is less than 128 bits.

The preset total storage amount may refer to a storage space that can be reserved for the data identifier in the hardware device, or may be a total amount of all data of the data identifier to be generated. The specific content of the preset total storage amount is not limited herein.

Specifically, when the storage space in the hardware device is not much left, or the total amount of data is small, the total number of bits of the data identifier may be set to be small. When the storage space in the hardware device is more left, or the total amount of data is more, the total number of bits of the data identifier may be set to be more.

In practical applications, the total number of bits of the data identifier may be 64 bits, 32 bits, etc., and the specific number of bits may be determined according to practical needs, which is not specifically limited herein.

S502: and determining the bit number of the node identifier in the data identifier according to the total number of the nodes of each node.

After the total digits of the data identifications are determined, for the digits occupied by the node identifications and the local identifications in the data identifications, the digits of the node identifications can be determined firstly, and then the total digits of the data identifications are subtracted from the digits of the node identifications, so that the digits of the local identifications can be determined. Or, the bit number of the local identifier may be determined first, and then the total bit number of the data identifier is subtracted from the bit number of the local identifier, so that the bit number of the node identifier can be determined.

In the process of determining the node identification digit, the digit of the node identification can be determined according to the total number of the nodes of each node. Specifically, when the total number of nodes is large compared to the total number of data in the nodes, the number of bits of the node identification may be set to be more than that of the local identification. The number of bits of the node identification may be set to be less than the number of bits of the local identification when the total number of nodes is less than the total number of data in the node.

For the specific determination of how many bits the node identifier occupies, the total number of nodes of each node may be divided by the maximum classification number of each bit, and the obtained numerical value may be used as the number of bits of the node identifier. For example: the total number of nodes is 50, each bit can be a value of 10, namely 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9, namely the maximum classification number is 10, and then 50 is divided by 10 to obtain 5. Then the number of bits of the node identification is set to 5. Of course, 6, 7, etc. may also be provided. Of course, the number of bits of the node identifier may also be determined based on the total number of nodes in other manners, which is not specifically limited herein.

S503: and subtracting the total digits of the data identifications from the digits of the node identifications to obtain the digits of the local identifications in the data identifications.

After the total number of bits of the data identification and the number of bits of the node identification are determined, the two are subtracted, and the obtained result is the number of bits of the local identification. Thus, the number of bits of the node identifier and the number of bits of the local identifier are determined. And then, the node identification of each node can be determined based on a random number generation algorithm, and the local identification of the data in each node can be determined based on a numerical accumulation mode.

Besides the node id and the local id, there is a symbolic id, so when allocating bits to the node id and the local id, a certain number of bits needs to be left for the symbolic id.

In practical application, in the data identifier generation method provided by the embodiment of the present application, the total number of bits of the data identifier may be 64 bits. Through a great deal of practice, the UUID of 64 bits is enough to meet the requirement of actual data identification, and the uniqueness of the data identification can be ensured.

Fig. 6 is a schematic structural diagram of data identifier in the embodiment of the present application, and referring to fig. 6, in a 54-bit UUID, a 64 th bit is a sign bit when viewed from right to left, and is used for adding a sign identifier of target data. Bits 37-63 are node IDs for adding node identifications of the nodes where the target data is located. Bits 1-36 are an increasing sequence for adding the local identification of the target data within the node.

Table 1 below is detailed information of each position in the data identification.

TABLE 1

Position of	Content providing method and apparatus	Length of	Description of the invention
				64	Sign bit	1	Taking integers
37-63	Node ID	27	Prefix
				1-36	Increasing sequence	36	Self-increasing sequence

That is, the data identity consists of three parts: sign bit, node ID, and increasing sequence. Wherein the sign bit typically takes only positive numbers. The node ID is taken from a random value of 27 bits. The ascending sequence is an ascending 36-bit sequence.

As can be seen from the above, when the total amount of data is not large, the total number of bits of the data identifier may be set according to the preset total amount of storage. Therefore, each bit in the data identification can be more fully utilized, the number of bits of the data identification is reduced as much as possible, and the storage space is saved to the maximum extent.

Based on the same inventive concept, as an implementation of the method, the embodiment of the present application further provides a data identifier generating device, which is applied to each node in a cascading scenario. Fig. 7 is a schematic structural diagram of a first data identifier generating apparatus in an embodiment of the present application, and as shown in fig. 7, the apparatus may include:

a node identification module 701, configured to determine a node identification of a node where the target data is located, where the node identification of each node is generated based on a random number generation algorithm;

a local identification module 702, configured to perform numerical accumulation on the maximum data sequence number in the node where the target data is located, so as to generate a local identification of the target data;

a splicing module 703, configured to splice the node identifier of the node where the target data is located with the local identifier of the target data, so as to obtain the data identifier of the target data.

Further, as a refinement and extension of the apparatus shown in fig. 7, an embodiment of the present application further provides a data identifier generating apparatus. Fig. 8 is a schematic structural diagram of a second data identifier generating apparatus in the embodiment of the present application, and referring to fig. 8, the apparatus may include:

the bit number determination module 801 includes:

a total bit determining unit 8011, configured to set a total bit number of the data identifier according to a preset total storage amount, where the total bit number is smaller than 128 bits.

A node position determining unit 8012, configured to determine, according to the total number of nodes of each node, the number of bits of the node identifier in the data identifier.

A local bit determining unit 8013, configured to subtract the total bit number of the data identifier from the bit number of the node identifier, to obtain the bit number of the local identifier in the data identifier.

An identity generation module 802, comprising:

and the parameter calculation unit 8021 is configured to calculate parameters in the metson rotation algorithm by using a linear congruence algorithm.

The parameter calculation unit 8021 is specifically configured to: determining a seed in the linear congruence algorithm according to the timestamp and the position information of the current node; and calculating parameters in the Messen rotation algorithm based on the linear congruence algorithm substituted into the seeds.

An identifier generating unit 8022, configured to generate node identifiers of the nodes respectively based on the meisen rotation algorithm substituted for the parameters.

And the node identification module 803 is configured to determine a node identification of a node where the target data is located, where the node identification of each node is generated based on a random number generation algorithm. The random number generation algorithm comprises a Meisen rotation algorithm.

A local identification module 804, configured to, when the number of the target data is multiple, obtain, from a data table, local identifications of a corresponding number after accumulation of a maximum data sequence number in a node where the target data is located, respectively serve as local identifications of the target data, and update the target data and the local identifications thereof in the data table.

A symbol identification module 805 comprising:

an attribute determining unit 8051, configured to determine an attribute of the target data.

Wherein the attributes of the target data at least comprise: one or more of a type of the target data, a meaning of the target data, a user to which the target data belongs.

A symbol generating unit 8052, configured to generate a symbol identifier of the target data based on the attribute of the target data.

A splicing unit 806, configured to splice the symbol identifier of the target data, the node identifier of the node where the target data is located, and the local identifier of the target data to obtain the data identifier of the target data.

It is to be noted here that the above description of the embodiments of the apparatus, similar to the description of the embodiments of the method described above, has similar advantageous effects as the embodiments of the method. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

Based on the same inventive concept, the embodiment of the application also provides the electronic equipment. Fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application, and referring to fig. 9, the electronic device may include: a processor 901, a memory 902, a bus 903; the processor 901 and the memory 902 complete communication with each other through the bus 903; the processor 901 is configured to call program instructions in the memory 902 to perform the method in one or more embodiments described above.

It is to be noted here that the above description of the embodiments of the electronic device, similar to the description of the embodiments of the method described above, has similar advantageous effects as the embodiments of the method. For technical details not disclosed in the embodiments of the electronic device of the present application, refer to the description of the embodiments of the method of the present application for understanding.

Based on the same inventive concept, the embodiment of the present application further provides a computer-readable storage medium, where the storage medium may include: a stored program; wherein the program controls the device on which the storage medium is located to execute the method in one or more of the above embodiments when the program runs.

It is to be noted here that the above description of the storage medium embodiments, like the description of the above method embodiments, has similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data identifier generation method is applied to each node under a cascading scene, and comprises the following steps:

determining node identification of a node where target data is located, wherein the node identification of each node is generated based on a random number generation algorithm;

performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identifier of the target data;

and splicing the node identification of the node where the target data is located with the local identification of the target data to obtain the data identification of the target data.

2. The method of claim 1, wherein the random number generation algorithm comprises a meisen rotation algorithm; before the determining the node identification of the node where the target data is located, the method further comprises:

calculating parameters in the Messen rotation algorithm by adopting a linear congruence algorithm;

and respectively generating node identifiers of the nodes based on the Meisen rotation algorithm substituted into the parameters.

3. The method of claim 2, wherein said calculating parameters in said Meisen rotation algorithm using a linear congruence algorithm comprises:

determining a seed in the linear congruence algorithm according to the timestamp and the position information of the current node;

and calculating parameters in the Messen rotation algorithm based on the linear congruence algorithm substituted into the seeds.

4. The method of claim 1, wherein the numerically accumulating the maximum data sequence number in the node where the target data is located to generate the local identifier of the target data comprises:

when the number of the target data is multiple, acquiring local identifications of the corresponding number after the maximum data sequence number accumulation in the node where the target data is located from a data table, respectively using the local identifications as the local identifications of the target data, and updating the target data and the local identifications in the data table.

5. The method of claim 1, wherein prior to said concatenating the node identification of the node where the target data is located with the local identification of the target data, the method further comprises:

determining attributes of the target data;

generating a symbolic identification of the target data based on the attributes of the target data;

the splicing the node identifier of the node where the target data is located and the local identifier of the target data includes:

and splicing the symbol identification of the target data, the node identification of the node where the target data is located and the local identification of the target data.

6. The method of claim 5, wherein the attributes of the target data comprise at least: one or more of a type of the target data, a meaning of the target data, a user to which the target data belongs.

7. The method of any of claims 1 to 6, wherein prior to said determining a node identity of a node at which the target data is located, the method further comprises:

setting the total digit of the data identification according to a preset total storage amount, wherein the total digit is less than 128 bits;

determining the number of bits of the node identifier in the data identifier according to the total number of the nodes of each node;

and subtracting the total digit of the data identification from the digit of the node identification to obtain the digit of the local identification in the data identification.

8. A data identifier generating apparatus, wherein the apparatus is applied to each node in a cascading scenario, and the apparatus comprises:

the node identification module is used for determining the node identification of the node where the target data is located, and the node identification of each node is generated based on a random number generation algorithm;

the local identification module is used for performing numerical accumulation on the maximum data serial number in the node where the target data is located to generate a local identification of the target data;

and the splicing module is used for splicing the node identifier of the node where the target data is located with the local identifier of the target data to obtain the data identifier of the target data.

9. An electronic device, comprising: a processor, a memory, a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, comprising: a stored program; wherein the program, when executed, controls the device on which the storage medium is located to perform the method according to any one of claims 1 to 7.