CN113641872A

CN113641872A - Hashing method, hashing device, hashing equipment and hashing medium

Info

Publication number: CN113641872A
Application number: CN202111208276.5A
Authority: CN
Inventors: 黄缚鹏; 李雨鑫; 曲坛; 郭丽
Original assignee: Tianjin Yifuzhen Internet Hospital Co ltd; Beijing Yibai Technology Co ltd
Current assignee: Tianjin Yifuzhen Internet Hospital Co ltd; Beijing Yibai Technology Co ltd
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2021-11-12
Anticipated expiration: 2041-10-18
Also published as: CN113641872B

Abstract

The embodiment discloses a hashing method, a hashing device, equipment and a medium, wherein the hashing method comprises the following steps: acquiring a data set, and constructing a hash table for storing data in the data set, wherein the hash table comprises a preset number of storage positions; for any data, determining an index value corresponding to the data, and determining a storage position corresponding to the data according to the index value; if the storage position has a corresponding linked list, comparing the data with the existing nodes in the linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; and if the linked list does not exist, constructing the linked list for the storage position, and storing the data as the node of the linked list.

Description

Hashing method, hashing device, hashing equipment and hashing medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a hashing method, apparatus, device, and medium.

Background

In the prior art, a hash table is a common data structure and is widely applied to the field of data processing. However, in the process of establishing the hash table, if multiple threads write data to the same address of the hash table in parallel, when one thread writes data to the address, the address is locked, and after another thread which needs to write data to the address has to wait for the thread to finish writing data to the address, the next thread can access the address, that is, multiple threads have to write data to the address in series. Therefore, the thread resources are wasted, and the hash table can be established only by consuming longer time and more computing resources.

In view of this, a more efficient hash table establishment scheme is needed.

Disclosure of Invention

Embodiments of the present specification provide a hashing method, apparatus, device and medium, so as to solve the technical problem of how to more efficiently establish a hash table.

In order to solve the above technical problem, the embodiments of the present specification provide the following technical solutions:

an embodiment of the present specification provides a first hashing method, including:

acquiring a data set, and constructing a hash table for storing data in the data set, wherein the hash table comprises a preset number of storage positions;

for any data, determining an index value corresponding to the data, and determining a storage position corresponding to the data according to the index value;

if the storage position has a corresponding linked list, comparing the data with the existing nodes in the linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node;

if the target node does not exist in the linked list, storing the data as the node of the linked list;

and if the linked list does not exist, constructing the linked list for the storage position, and storing the data as the node of the linked list.

The embodiment of the present specification provides a second hashing method, including:

hashing any data, including: determining an index value corresponding to any data, determining a storage position corresponding to the data according to the index value, and storing the data serving as a node to a linked list corresponding to the storage position;

after all the data in the data set are hashed, for any storage position, if the linked list corresponding to the storage position has at least two nodes, the storage position is checked for duplication, so that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume corresponding to the reserved node as the number of the nodes of the plurality of nodes.

An embodiment of the present specification provides a hashing apparatus, including:

the table building module is used for acquiring a data set and building a hash table for storing data in the data set, wherein the hash table comprises a preset number of storage positions;

the index module is used for determining an index value corresponding to any data and determining a storage position corresponding to the data according to the index value;

the storage module is used for comparing the data with the existing nodes in the linked list if the corresponding linked list exists in the storage position; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; and if the linked list does not exist, constructing the linked list for the storage position, and storing the data as the node of the linked list.

the hash module is used for hashing any data and comprises: determining an index value corresponding to any data, determining a storage position corresponding to the data according to the index value, and storing the data serving as a node to a linked list corresponding to the storage position;

and the duplication checking module is used for hashing all data in the data set, and then checking the duplication of any storage position if the linked list corresponding to the storage position has at least two nodes so as to ensure that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume corresponding to the reserved node as the number of the nodes of the plurality of nodes.

at least one processor;

and the number of the first and second groups,

a memory communicatively coupled to the at least one processor;

wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the first or second hashing methods described above.

Embodiments of the present specification provide a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the first or second hashing method described above.

The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects: for any data, a preferred storage location corresponding to the data may be determined. Even if the data is being stored as a linked list node in the hash table by the corresponding thread, any other thread can read the node, and the other thread selects whether the corresponding data needs to be stored as the linked list node in the hash table or updates the data amount corresponding to the same node according to the reading condition. Therefore, even if a thread stores data as a linked list node, the storage position of the hash table cannot be locked, the linked list nodes are not influenced to be read by other threads, and whether the corresponding data needs to be stored as the linked list node in the hash table or the data amount corresponding to the same node is updated according to the reading condition of other threads is not influenced, so that the parallel hash of multiple threads on the corresponding data is realized, and the hash efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present specification or the prior art will be briefly described below. It should be apparent that the drawings described below are only some of the drawings to which the embodiments described in the present specification may relate, and that other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.

Fig. 1 is a schematic diagram of an execution body of a hashing method in a first embodiment of the present specification.

Fig. 2 is a flowchart illustrating a hashing method in the first embodiment of the present specification.

Fig. 3 is a schematic diagram of a newly-created linked list and nodes in the first embodiment of the present specification.

Fig. 4 is a schematic diagram of updating the data amount in the preliminary hash process in the first embodiment of the present specification.

Fig. 5 is a schematic diagram of a new node in an existing linked list in the first embodiment of the present specification.

Fig. 6 is a schematic diagram of duplication checking in the first embodiment of the present specification.

Fig. 7 is a flowchart illustrating a hashing method in a second embodiment of the present specification.

Fig. 8 is a schematic structural diagram of a hashing apparatus in a third embodiment of this specification.

Fig. 9 is a schematic structural diagram of a hashing apparatus in a fourth embodiment of this specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings of the embodiments of the present specification. It is to be understood that the embodiments described herein are only some embodiments of the application and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.

A first embodiment (hereinafter referred to as "embodiment one") of this specification provides a hashing method, and an execution subject of embodiment one may be a terminal (including but not limited to a mobile phone, a computer, a pad, a television) or a server or an operating system or an application program or a hashing platform or hashing system, and the like, that is, the execution subject may be various and may be set, used, or transformed as needed. In addition, a third party application may assist the execution principal in executing embodiment one. For example, as shown in fig. 1, the hashing method in the first embodiment may be performed by a server, and an application program (corresponding to the server) may be installed on a terminal (held by a user), and data transmission may be performed between the terminal or the application program and the server, and data collection or input or output or page or information processing may be performed by the terminal or the application program, so as to assist the server in performing the hashing method in the first embodiment.

As shown in fig. 2, a hashing method according to a first embodiment includes:

s101: the method comprises the steps that (an execution body) a data set is obtained, and a hash table used for storing data in the data set is constructed, wherein the hash table comprises a preset number of storage positions;

the execution subject of embodiment one may obtain a data set, where data in the data set is data that needs to be stored in a hash table. The execution body of the first embodiment may construct a hash table for storing data in the data set, where the hash table includes a preset number of storage locations (the storage locations are equivalent to storage addresses or arrays), and each storage location is used for storing data in the data set.

In one embodiment, the data in the data set may have a basic unit, such as a unit of a bar or a group. The embodiment is not particularly limited, particularly how to divide the data units and the data content of each unit of data. For example, the data set may include point cloud data for characterizing a three-dimensional model (in an embodiment, the three-dimensional model includes, but is not limited to, an industrial product three-dimensional model such as a vehicle three-dimensional model of a vehicle, a flight vehicle, etc., or an industrial machine three-dimensional model, a machine three-dimensional model, or other three-dimensional models of products obtained through industrial production) as follows:

0 0 0

0 0 1

0 1 1

0 1 0

1 0 0

1 0 1

1 1 1

1 1 0

3 0 1 2

3 0 1 4

3 0 2 3

3 1 4 5

3 0 3 4

3 2 3 6

3 1 5 2

3 4 5 6

3 3 4 7

3 3 6 7

3 2 5 6

3 4 6 7

in the above point cloud data, the data of each of the first 8 rows represents the coordinates of a point, so the data of the first 8 rows represents the coordinates of the 8 points from 0 to 7 (for example only). Each row 9 to 20 represents a face composed of points, wherein the number 3 at the head of each row represents a face composed of three points, i.e., a triangular patch; the last three digits of each row represent the indices of the three points that make up the face. For example, "3012" represents a face made up of points 0, 1, 2, "3014" represents a face made up of points 0, 1, 4, and so on. Each line of data may be treated as a unit of data.

Hereinafter, any unit of data is simply referred to as any data or one data.

S103: determining an index value corresponding to any data for the data, and determining a storage position corresponding to the data according to the index value;

for any data, not referred to as data a, the execution principal of embodiment one may determine the index value of data a. Determining the index value corresponding to the data a may include: and determining the index value corresponding to the data A according to a hash function (hash function). For example, for any line of point cloud data, the coordinates of the points in the line of data may be used as parameters or keywords, and a hash function is used to obtain a unique integer corresponding to the line of data, where the integer is used as an index value corresponding to the line of data.

And determining the storage position corresponding to the data A according to the index value corresponding to the data A. The storage location corresponding to the data a may be a storage location corresponding to an "index value corresponding to the data a," specifically, "a storage location using the index value corresponding to the data a as a subscript (or a pointer, the same applies hereinafter"), and the storage location corresponding to the data a is not referred to as a first storage location or a preferred storage location of the data a. That is, data a, the index value corresponding to data a, and the first storage location of data a correspond to each other. In this way, the first storage location of data a in the hash table may be determined by the index value corresponding to data a.

For any storage location, if there is data corresponding to the storage location, the execution subject of the first embodiment may construct a linked list corresponding to the storage location, where the linked list may include one or more nodes, and the node may be data corresponding to the storage location.

In general, the data in a data set may have an order such that the first storage location of each data in the data set may be determined in order. In fact, there may be situations where multiple data correspond to the same index value, i.e., there may be a "collision" of the index value or the first storage location. Thus, after determining the first storage location of data a, there may be one or more nodes on the linked list corresponding to the first storage location, where each node is data corresponding to the first storage location of data a.

After the first storage location of the data a is determined, the execution main body of the first embodiment determines whether the first storage location of the data a corresponds to a linked list.

S105: (execution body) if the storage position has a corresponding linked list, comparing the data with the existing nodes in the linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; and if the linked list does not exist, constructing the linked list for the storage position, and storing the data as the node of the linked list.

The following is illustrated in sections 1.1 and 1.2:

1.1, the first storage location of the data a does not correspond to a linked list, which indicates that the data a is the first data corresponding to the first storage location thereof, the execution main body in the first embodiment may construct a linked list corresponding to the first storage location of the data a, and store the data a as a node (i.e., a newly-created linked list and a newly-created node, as shown in fig. 3) of the linked list corresponding to the first storage location thereof, which is equivalent to store the data a to the first storage location thereof.

1.2, the first storage location of the data a corresponds to a linked list, which indicates that the data a is not the first data corresponding to the first storage location thereof, the execution main body in the first embodiment may compare the node in the linked list corresponding to the first storage location of the data a with the data a, and determine whether there is a target node of the data a in the node in the linked list corresponding to the first storage location of the data a, where the target node of the data a is the same as the data a, and the target node of the data a is hereinafter referred to as a "target node".

And if the target node does not exist in the nodes in the linked list corresponding to the first storage position of the data A, storing the data A as the nodes of the linked list corresponding to the first storage position of the data A.

If the target node exists in the nodes in the linked list corresponding to the first storage position of the data A, the data A does not need to be stored as the nodes of the linked list corresponding to the first storage position of the data A, and the data volume corresponding to the target node is updated. For any node, the data volume corresponding to the node represents the data volume of data corresponding to the storage position corresponding to the linked list where the node is located and the data volume is the same as that of the node. Specifically, the node may have a corresponding parameter for recording a data amount corresponding to the node.

The data amount may be counted from 1, and if a target node exists in the nodes in the linked list corresponding to the first storage location of the data a, updating the data amount corresponding to the target node may include: the data amount corresponding to the target node is increased by 1 as shown in fig. 4.

For the data a, if the data a is stored as the node of the linked list corresponding to the first storage location, the data a may be stored as the first node of the linked list corresponding to the first storage location, that is, for any storage location, every time data is to be stored as the node of the linked list corresponding to the storage location, the newly stored node is placed at the first node of the linked list corresponding to the storage location. Thus, for two identical data, after the previous data is stored as a linked list node, the next data can be compared with the linked list node formed by the previous data more quickly. For example, as shown in fig. 5, if a certain storage location corresponds to a node 1 already existing in the linked list, and if data a is stored as a node of the linked list, a node 2 formed by the data a is placed before the node 1.

The above process is a process of performing a preliminary hash on the data a. The execution subject of embodiment one may generate a hash thread (i.e., a thread for performing a preliminary hash operation) corresponding to the data a, and execute the hash thread corresponding to the data a to perform a preliminary hash on the data a.

The preliminary hash process includes a preliminary duplication checking process, that is, the comparison between the data a and the node is a preliminary duplication checking process. Through the preliminary hash process, most of the same data can be represented by the data quantity corresponding to the same node. But for any storage location, different nodes in its corresponding linked list may be the same data. It is illustrated below by way of example that different nodes may be identical:

for example, for a point cloud dataset, assuming that the fth and g row data are both coordinates of point 1 and point 2, the fth row data is ordered before the g row data. In the process that the hash thread corresponding to the fth row data stores the fth row data as a node of a first storage position (not recorded as a storage position h) of the fth row data, the hash thread corresponding to the g row data determines that the first storage position of the g row data is the storage position h. When the storage position h of the line g data is determined, the hash thread corresponding to the line f data stores part of the data as a node (not marked as a node i) in the linked list corresponding to the storage position h, for example, stores the coordinate of a point 1 as the node i in the linked list corresponding to the storage position h. And comparing the data of the ith row with the nodes of the linked list corresponding to the storage position h by the hash thread corresponding to the ith row because the storage positions of the ith row and the ith row are the same and the storage position h corresponds to the linked list. When the g-th row data is compared with the node i, because the node i is only the coordinate of the point 1 and is different from the g-th row data (namely the coordinates of the point 1 and the point 2), and if other nodes of the linked list corresponding to the storage position h are also different from the g-th row data, the hash thread corresponding to the g-th row data stores the g-th row data as the node (not marked as the node j) of the linked list corresponding to the storage position h. After the data of the data f is completely stored as the node i by the hash thread corresponding to the data f, at least two different nodes i and j in the linked list corresponding to the storage position h are the same data.

The above are examples only. As can be seen from the above, after the above process is performed on each data in the data set, not only may a plurality of nodes in the linked list corresponding to the same storage location be the same data, but also the data amount corresponding to the plurality of nodes may be 1 or greater than 1.

In an embodiment, after the preliminary hash process is performed on all data in the data set, the performing entity in the embodiment may perform a duplicate checking on the hash table (as opposed to the preliminary duplicate checking, this duplicate checking is equivalent to a secondary duplicate checking). How to duplicate the hash table is explained as follows:

for any storage location:

if the storage position does not have a corresponding linked list, which indicates that no data corresponds to the storage position, the storage position does not need to be checked for duplication.

If the storage position has a corresponding linked list, but the linked list corresponding to the storage position has only one node, the storage position does not need to be checked for duplication.

And if the storage position has a corresponding linked list and the linked list corresponding to the storage position has at least two nodes, checking the storage position for duplication. Wherein, the checking the storage location may include: traversing the nodes in the linked list corresponding to the storage position, if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume corresponding to the reserved node as the sum of the data volumes corresponding to the plurality of nodes, so that: if a plurality of nodes in the linked list corresponding to the storage location are the same data, one node in the plurality of nodes is reserved, and the data size corresponding to the reserved node is the sum of the data sizes corresponding to the plurality of nodes, for example, as shown in fig. 6. For another example, if k nodes in the linked list corresponding to the first storage location of the data a are all the data a, and the data amount corresponding to the k nodes is m1 to mk, one node in the k nodes is reserved, and the data amount corresponding to the reserved node is the sum of m1 to mk.

The execution subject of embodiment one may generate a corresponding duplicate checking thread for each storage location that needs to be duplicated (i.e., a thread for duplication checking, the same applies below). And executing a duplication checking thread corresponding to any storage position needing duplication checking so as to carry out duplication checking on the storage position. Or, the execution subject of the first embodiment may generate a corresponding duplicate checking thread for each storage location, and for any duplicate checking thread, if the storage location corresponding to the duplicate checking thread is not the storage location that needs to be duplicated, the duplicate checking thread is released. And executing each reserved duplication checking thread to check the duplication of each storage position needing duplication checking.

Through the preliminary hash process and the duplication checking process, a needed hash table is established, and the hash of the data in the data set is realized.

In one embodiment, for any data, a preferred storage location corresponding to the data may be determined. Even if the data is being stored as a linked list node in the hash table by the corresponding thread, any other thread can read the node, and the other thread selects whether the corresponding data needs to be stored as the linked list node in the hash table or updates the data amount corresponding to the same node according to the reading condition. Therefore, even if a thread stores data as any linked list node corresponding to any storage position, the storage position cannot be locked, other threads cannot be influenced to read the linked list node, other threads cannot be influenced to select whether the corresponding data needs to be stored as the linked list node in the hash table or update the data amount corresponding to the same node according to the reading condition, and therefore lock-free hashing is achieved, parallel hashing of the multiple hash threads on the corresponding data is achieved, the hashing efficiency and the hash table building efficiency are improved, the hash table building time is reduced, and the machine performance and the thread resource utilization rate are improved.

In the first embodiment, in the duplicate checking process, on one hand, different nodes of the linked list corresponding to the same storage position in the hash table after duplicate checking are ensured to be different data; on the other hand, for any node of the linked list corresponding to any storage position in the checked hash table, the data amount corresponding to the node represents the same data amount as the node in the data set. If the same data in the data set is classified into one class or one layer, any node of the linked list corresponding to any storage position in the duplicated hash table is one class or one layer of data, and the data volume corresponding to any node of the linked list corresponding to any storage position is the data volume contained in the class or one layer of data corresponding to the node. Embodiments enable hash table establishment and data classification or data layering together. After the data is classified or layered according to the first embodiment, a subsequent operation may be performed based on the classification or layering of the data, for example, downsampling the classified or layered data, for example, performing mesh encoding based on the classified or layered point cloud data.

In the first embodiment, after the data in the data set is subjected to the preliminary hash, most of the same data is represented by the data amount corresponding to the same node, that is, the data classification or layering is already preliminarily realized. Even if different nodes in the linked list are possibly the same data after the hash process, the number of the nodes is small, so that in the duplicate checking process, one duplicate checking thread is not required to be executed on each storage position, the duplicate checking thread is reserved for the storage positions corresponding to the linked lists of at least two nodes, other duplicate checking threads are released, duplicate checking of the whole hash table can be realized only by virtue of a small number of duplicate checking threads, thread resources can be saved, the machine performance and the duplicate checking efficiency are improved, the hash table establishing time is reduced, and the hash table establishing efficiency is improved.

In the first embodiment, since the same data can be represented by the data amount corresponding to the node having the same data in the linked list corresponding to the storage location, the number of the storage locations of the hash table can be greatly smaller than the data amount in the data set, which is also beneficial to improving the efficiency of establishing the hash table, and is particularly suitable for hashing data sets with large data repetition (the more data repetition, the less the storage location corresponding to data). It should be noted that, although the number of storage locations of the hash table is small in the first embodiment, each data may be made to have a smaller index value than the number of storage locations of the hash table by using a suitable hash function. For example, if the number of storage locations of the hash table is x, after a unique integer (that is, key) corresponding to the data a is obtained through a hash function, the integer and x may be subjected to remainder, that is, key% hash _ size is executed, where hash _ size is x, and the remainder is taken as an index value corresponding to the data a.

Due to the characteristics of the first embodiment, the primary hash process and the duplicate checking process of the first embodiment are particularly suitable for being performed in the GPU, that is, parallel hashing of data corresponding to each hash thread in the GPU is achieved.

A second embodiment (hereinafter, referred to as "embodiment two") of the present specification provides a hashing method, and the execution subject of embodiment two refers to embodiment one.

As shown in fig. 7, the hashing method provided in the second embodiment includes:

s202: the method comprises the steps that (an execution body) a data set is obtained, and a hash table used for storing data in the data set is constructed, wherein the hash table comprises a preset number of storage positions;

refer to S101.

S204: the (execution subject) hashes any data, including: determining an index value corresponding to any data, determining a storage position corresponding to the data according to the index value, and storing the data serving as a node to a linked list corresponding to the storage position;

for any data in the data set, the execution subject of the second embodiment may hash the data, that is, perform a hash operation on the data. Hashing any data may include: for any data, a hash thread (i.e. a thread for performing a hash operation) corresponding to the data is generated, and the hash thread corresponding to the data is executed to hash the data.

For any data, not referred to as data a, how to hash data a is further described below:

s2042: determining an index value corresponding to the data A, and determining a storage position corresponding to the data according to the index value;

the specific contents of this step refer to the first embodiment.

S2044: storing the data A as a node to a linked list corresponding to the storage position;

different from the first embodiment, in the second embodiment, after the storage location corresponding to the data a is determined, the data a is stored as a node in the linked list corresponding to the storage location corresponding to the node, and the preliminary duplicate checking process in the first embodiment does not need to be performed.

S206: (execution main body) after hashing all data in the data set, for any storage position, if the linked list corresponding to the storage position has at least two nodes, then carrying out duplicate checking on the storage position so as to: if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume corresponding to the reserved node as the number of the nodes of the plurality of nodes.

In the second embodiment, after all the data in the data set are hashed, the hash table is subjected to duplicate checking. How to duplicate the hash table is explained as follows:

for any storage location:

And if the storage position has a corresponding linked list and the linked list corresponding to the storage position has at least two nodes, checking the storage position for duplication. Wherein, the checking the storage location may include: traversing the nodes in the linked list corresponding to the storage position, if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume (data volume meaning refers to embodiment one) corresponding to the reserved node as the number of the nodes of the plurality of nodes, so that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, one node in the plurality of nodes is reserved, and the data volume corresponding to the reserved node is the number of the nodes. For example, if n nodes in the linked list corresponding to the storage location of the data a are all the data a, and since there is no preliminary duplicate checking, the data amount corresponding to the n nodes is naturally 1, one node in the n nodes is reserved, and the data amount corresponding to the reserved node is n.

The execution subject of the second embodiment may generate a corresponding duplicate checking thread (i.e., a thread for duplicate checking, the same applies below) for each storage location that needs to be duplicated. And executing a duplication checking thread corresponding to any storage position needing duplication checking so as to carry out duplication checking on the storage position. Or, the execution subject of the second embodiment may generate a corresponding duplicate checking thread for each storage location, and for any duplicate checking thread, if the storage location corresponding to the duplicate checking thread is not the storage location that needs to be duplicated, the duplicate checking thread is released. And executing each reserved duplication checking thread to check the duplication of each storage position needing duplication checking.

Through the hashing process and the duplication checking process, a needed hash table is established, and the hash of the data in the data set is realized.

In the first embodiment, when a plurality of hash threads need to store corresponding data as a linked list node corresponding to the same storage location in the hash table, the plurality of hash threads do not affect each other, that is, when any thread stores corresponding data as a linked list node corresponding to a certain storage location in the hash table, the storage location is not locked, so that any other thread does not affect the storage of corresponding data as a linked list node corresponding to the storage location in the hash table. Therefore, the first embodiment realizes lock-free hashing, realizes parallel hashing of the data corresponding to each hashing thread by the plurality of hashing threads, can improve hashing efficiency and hash table establishing efficiency, reduces hash table establishing time, and improves machine performance and thread resource utilization rate.

In the first embodiment, through the hashing and duplicate checking processes, on one hand, different nodes of a linked list corresponding to the same storage position in a duplicate checked hash table are ensured to be different data; on the other hand, for any node of the linked list corresponding to any storage position in the checked hash table, the data amount corresponding to the node represents the same data amount as the node in the data set. If the same data in the data set is classified into one class or one layer, any node of the linked list corresponding to any storage position in the duplicated hash table is one class or one layer of data, and the data volume corresponding to any node of the linked list corresponding to any storage position is the data volume contained in the class or one layer of data corresponding to the node. Embodiments enable hash table establishment and data classification or data layering together. After the data is classified or layered according to the first embodiment, a subsequent operation may be performed based on the classification or layering of the data, for example, downsampling the classified or layered data, for example, performing mesh encoding based on the classified or layered point cloud data.

In the first embodiment, in the duplicate checking process, one duplicate checking thread does not need to be executed for each storage position, the duplicate checking thread is reserved for the storage positions corresponding to the linked lists of at least two nodes, other duplicate checking threads are released, duplicate checking of the whole hash table can be realized only by a small number of duplicate checking threads, thread resources can be saved, machine performance and duplicate checking efficiency can be improved, hash table establishment time can be shortened, and hash table establishment efficiency can be improved.

Due to the characteristics of the first embodiment, the hash process and the duplicate checking process of the first embodiment are particularly suitable for being performed in the GPU, that is, parallel hashing of data corresponding to each hash thread in the GPU is achieved.

As shown in fig. 8, a third embodiment of the present specification provides a hashing apparatus corresponding to the method according to the first embodiment, including:

a table building module 301, configured to obtain a data set, and build a hash table for storing data in the data set, where the hash table includes a preset number of storage locations;

the index module 303 is configured to determine, for any data, an index value corresponding to the data, and determine a storage location corresponding to the data according to the index value;

a storage module 305, configured to compare the data with existing nodes in the linked list if the storage location has a corresponding linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; and if the linked list does not exist, constructing the linked list for the storage position, and storing the data as the node of the linked list.

Optionally, the apparatus further comprises:

the duplication checking module is used for determining an index value corresponding to any data and determining a storage position corresponding to the data according to the index value; if the storage position has a corresponding linked list, comparing the data with the existing nodes in the linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; if the linked list does not exist, the linked list is constructed for the storage position, the data is stored as the node of the linked list, and then for any storage position, if the linked list corresponding to the storage position has at least two nodes, the storage position is checked for duplication, so that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, one node in the plurality of nodes is reserved, and the data volume corresponding to the reserved node is determined as the sum of the data volumes corresponding to the plurality of nodes.

Optionally, the duplicate checking module is configured to, for any storage location, generate a duplicate checking thread corresponding to the storage location if the linked list corresponding to the storage location has at least two nodes;

and executing the duplication checking thread to check the storage position.

Optionally, the storing the data as the node of the linked list includes:

this data is stored as the first node of the linked list.

Optionally, determining the index value corresponding to the data includes:

and determining the index value corresponding to the data according to the hash function.

As shown in fig. 9, a fourth embodiment of the present specification provides a hashing apparatus corresponding to the method described in the second embodiment, including:

a table building module 402, configured to obtain a data set, and build a hash table for storing data in the data set, where the hash table includes a preset number of storage locations;

the hash module 404 is configured to hash any data, including: determining an index value corresponding to any data, determining a storage position corresponding to the data according to the index value, and storing the data serving as a node to a linked list corresponding to the storage position;

a duplicate checking module 406, configured to, after hashing all data in the data set, check a duplicate of any storage location if a linked list corresponding to the storage location has at least two nodes, so that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, reserving one node in the plurality of nodes, and determining the data volume corresponding to the reserved node as the number of the nodes of the plurality of nodes.

Optionally, the hash module 404 is configured to generate a hash thread corresponding to any data, and execute the hash thread to hash the data;

and/or the presence of a gas in the gas,

the duplicate checking module 406 is configured to, for any storage location, generate a duplicate checking thread corresponding to the storage location if the linked list corresponding to the storage location has at least two nodes; and executing the duplication checking thread to check the storage position.

A fifth embodiment of the present specification provides a hash apparatus including:

at least one processor;

and the number of the first and second groups,

a memory communicatively coupled to the at least one processor;

wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of embodiment one or embodiment two.

A sixth embodiment of the present specification provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, perform the method of the first or second embodiment.

The above embodiments may be used in combination, and the modules having the same name between different embodiments or within the same embodiment may be the same or different modules.

While certain embodiments of the present disclosure have been described above, other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and non-volatile computer-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiments.

The apparatus, the device, the nonvolatile computer readable storage medium, and the method provided in the embodiments of the present specification correspond to each other, and therefore, the apparatus, the device, and the nonvolatile computer storage medium also have similar advantageous technical effects to the corresponding method.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, the present specification embodiments may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A hashing method, comprising:

2. The method of claim 1, wherein for any data, an index value corresponding to the data is determined, and a storage location corresponding to the data is determined according to the index value; if the storage position has a corresponding linked list, comparing the data with the existing nodes in the linked list; if a target node exists in the linked list and the target node is the same as the data, updating the data volume corresponding to the target node; if the target node does not exist in the linked list, storing the data as the node of the linked list; if the linked list does not exist, the linked list is constructed for the storage position, and after the data is stored as the node of the linked list, the method further comprises the following steps:

for any storage position, if the linked list corresponding to the storage position has at least two nodes, the storage position is checked for duplication so as to ensure that: if a plurality of nodes in the linked list corresponding to the storage position are the same data, one node in the plurality of nodes is reserved, and the data volume corresponding to the reserved node is determined as the sum of the data volumes corresponding to the plurality of nodes.

3. The method of claim 2, further comprising:

for any storage position, if the linked list corresponding to the storage position has at least two nodes, generating a duplicate checking thread corresponding to the storage position;

and executing the duplication checking thread to check the storage position.

4. The method of claim 1, wherein storing the data as a node of the linked list comprises: storing the data as a first node of the linked list;

and/or the presence of a gas in the gas,

determining the index value corresponding to the data comprises: and determining the index value corresponding to the data according to the hash function.

5. A hashing method, comprising:

6. The method of claim 5, further comprising:

generating a hash thread corresponding to any data, and executing the hash thread to hash the data;

and/or the presence of a gas in the gas,

the method further comprises the following steps:

and executing the duplication checking thread to check the storage position.

7. A hashing apparatus, comprising:

8. A hashing apparatus, comprising:

9. A hashing apparatus, comprising:

at least one processor;

and the number of the first and second groups,

a memory communicatively coupled to the at least one processor;

wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

10. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the method of any one of claims 1 to 6.