CN110929103B - Method for constructing index for data set, data query method and computing equipment - Google Patents

Method for constructing index for data set, data query method and computing equipment Download PDF

Info

Publication number
CN110929103B
CN110929103B CN201911144368.4A CN201911144368A CN110929103B CN 110929103 B CN110929103 B CN 110929103B CN 201911144368 A CN201911144368 A CN 201911144368A CN 110929103 B CN110929103 B CN 110929103B
Authority
CN
China
Prior art keywords
data
node
linked list
value
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911144368.4A
Other languages
Chinese (zh)
Other versions
CN110929103A (en
Inventor
杨明哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chezhi Interconnection Beijing Technology Co ltd
Original Assignee
Chezhi Interconnection Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chezhi Interconnection Beijing Technology Co ltd filed Critical Chezhi Interconnection Beijing Technology Co ltd
Priority to CN201911144368.4A priority Critical patent/CN110929103B/en
Publication of CN110929103A publication Critical patent/CN110929103A/en
Application granted granted Critical
Publication of CN110929103B publication Critical patent/CN110929103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for constructing indexes for data sets, a data query method and computing equipment, wherein the data sets are stored in the computing equipment in an ordered single linked list mode, each element of the data sets respectively corresponds to different data nodes of the single linked list, and the method for constructing the indexes for the data sets comprises the following steps: inserting a plurality of virtual nodes into the single linked list to divide the single linked list into a plurality of sub-chain tables, wherein the virtual nodes are initial nodes of the sub-chain tables; establishing array indexes for a plurality of virtual nodes so as to locate the child chain table according to the array indexes, wherein keys of the array indexes are node values of the virtual nodes, and the values are addresses of the virtual nodes; and establishing a hash index for the data node so as to locate the data node according to the hash index, wherein a key of the hash index is a node value of the data node, and the value is an address of the data node.

Description

Method for constructing index for data set, data query method and computing equipment
Technical Field
The invention relates to the field of databases, in particular to a method for constructing indexes for data sets, a data query method and computing equipment.
Background
With the rapid development of the internet and the rapid increase of data volume, the contained information is increasingly rich, the application of data is deep in the aspects of life and work, and meanwhile, the retrieval of data to obtain a desired result becomes an important aspect of application data.
At present, the main mode for retrieving data is based on a single linked list for storing an ordered data set, and the index of the ordered data set is realized through a plurality of layer-by-layer sparse single linked lists. On one hand, when single-point query is carried out, the index needs to be started from the top-level single-linked list to the bottom-level storage linked list, so that the average index time is longer, and the index efficiency is lower. On the other hand, because the nodes and the number in each layer of index linked list determine the index efficiency, adding nodes to each layer of linked list and establishing the association between layers are complex, and the improvement of the index efficiency is limited. How to enable the server to find qualified data as soon as possible is often a key for technical staff to improve program efficiency and for the server to improve user experience.
To this end, a new method of indexing a data set is needed.
Disclosure of Invention
To this end, the present invention provides a method of indexing a data set in an attempt to solve, or at least alleviate, the problems identified above.
According to one aspect of the present invention, there is provided a method for constructing an index for a data set, executed in a computing device, the data set being stored in the computing device in the form of an ordered singly linked list, elements of the data set respectively corresponding to different data nodes of the singly linked list, the method comprising: inserting a plurality of virtual nodes into the single linked list to divide the single linked list into a plurality of sub-chain tables, wherein the virtual nodes are initial nodes of the sub-chain tables; establishing array indexes for a plurality of virtual nodes so as to locate the child chain table according to the array indexes, wherein keys of the array indexes are node values of the virtual nodes, and values are addresses of the virtual nodes; and establishing a hash index for the data node so as to locate the data node according to the hash index, wherein a key of the hash index is a node value of the data node, and a value is an address of the data node.
Optionally, in the method for constructing an index for a data set according to the present invention, inserting a plurality of dummy nodes into a single linked list to divide the single linked list into a plurality of child linked lists, includes: acquiring a value interval of data in a data set; dividing the value interval into a plurality of subintervals with equal length; and respectively inserting the initial value of each subinterval into the singly linked list as the node value of the virtual node.
Optionally, in the method for constructing an index for a data set according to the present invention, if the starting value of the subinterval is a node value of a data node in the singly linked list, the data node is regarded as a dummy node, and the state of the dummy node is marked as valid.
Optionally, in the method for constructing an index for a data set according to the present invention, the method further includes: inserting a plurality of virtual nodes into a predetermined sub-chain table to divide the sub-chain table into a plurality of subordinate sub-chain tables; establishing a lower array index for a plurality of virtual nodes in the predetermined child chain table; and updating the address corresponding to the sub-chain table in the array index into the address of the next-level array index.
Optionally, in the method for constructing an index for a data set according to the present invention, the predetermined child link table is a child link table of a plurality of child link tables, where a node density is greater than a global node density threshold; the node density is the number of active nodes in the sublink divided by the length of the sublink.
Optionally, in the method for constructing an index for a data set according to the present invention, the method further includes inserting data into the singly linked list according to the following steps: acquiring target data to be inserted; determining whether the target data is in a single linked list or not through the Hash index; if the target data is not in the single linked list, determining a sub-linked list corresponding to the target data through array indexing; starting from the starting node of the sub-chain table, the single chain table is traversed, the first data node which is larger than the target data is searched, and the target data is inserted before the data node.
Optionally, in the method for constructing an index for a data set according to the present invention, the method further includes performing data maintenance on the singly linked list, where the data maintenance includes deleting data, and deleting data includes: determining whether the value to be deleted is in the single linked list or not through the hash index; and if the value to be deleted is in the singly linked list, modifying the state of the data node corresponding to the value to be deleted into invalid.
Optionally, in the method for constructing an index for a data set according to the present invention, the data maintenance further includes data updating, and the data updating includes: deleting the data to be updated from the single linked list; and inserting the updated data into the singly linked list.
Optionally, in the method for constructing an index for a data set according to the present invention, the data maintenance further includes data cleaning, and the data cleaning includes: traversing all data nodes in the single chain table; judging whether each traversed data node is effective or not; and if the data node is invalid, deleting the data node from the singly linked list.
According to one aspect of the present invention, a data query method is provided, which is executed in a computing device, wherein a data set is stored in the computing device in an ordered single-linked list, each element of the data set corresponds to a different data node of the single-linked list, the data set is established with an array index and a hash index according to the above method, and the data query method includes: acquiring target data to be queried; determining whether the target data is in the single linked list or not through the Hash index; and if the target data is in the singly linked list, returning the node address of the target data.
Optionally, in a data query method according to the present invention, further comprising: acquiring a target data interval to be inquired; determining whether the initial value of the target data interval is in the single linked list or not through the Hash index; if the initial value is in the single linked list, traversing the single linked list from the data node corresponding to the initial value to acquire a query result corresponding to the target data interval; and if the initial value is not in the single linked list, determining the sub-linked list corresponding to the initial value through the array index, traversing the single linked list from the initial node of the sub-linked list, and acquiring the query result corresponding to the target data interval.
Optionally, in a data query method according to the present invention, determining a sublink table corresponding to a starting value through an array index includes: dividing the initial value by the length of the value interval corresponding to the sublink, and rounding downwards; multiplying the down rounding result by the length of the value interval to obtain a node value of the target virtual node; and determining a sub-chain table corresponding to the initial value from the array index according to the node value of the target virtual node.
Optionally, in a data query method according to the present invention, when an address of a lower array index is determined from the array index according to a node value of a target virtual node, the data query method further includes: determining a lower sublink list corresponding to the initial value through a lower array index; and traversing the single linked list from the initial node of the subordinate sublink list to obtain a query result corresponding to the target data interval.
According to yet another aspect of the present invention, there is provided a computing device comprising: one or more processors; a memory; and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs, when executed by the processors, implementing the steps of the method of indexing a data set and the data query method as described above.
According to a further aspect of the present invention there is provided a readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, implement the steps of the method of indexing a data set and the data query method as described above.
According to the scheme for constructing the index for the data set, the ordered single linked list is used for storing data, a plurality of virtual nodes are inserted into the single linked list, the single linked list is divided into a plurality of sub-chain tables, the array index is established for the virtual nodes, and the hash index is established for the data nodes in the single linked list.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a block diagram of a computing device 100, according to an embodiment of the invention;
FIG. 2 illustrates a flow diagram of a method 200 of indexing a data set according to one embodiment of the invention;
FIG. 3 illustrates a flow diagram of a method 300 of establishing a lower array index according to one embodiment of the invention;
FIG. 4 shows a flow diagram of a method 400 of inserting data according to one embodiment of the invention;
FIG. 5 is a diagram illustrating a singly linked list after nested segmentation and multi-level data indexing according to an embodiment of the invention;
FIG. 6 illustrates a flow diagram of a method 600 of data maintenance by deleting data, according to one embodiment of the invention;
FIG. 7 shows a flow diagram of a method 700 of data updating according to an embodiment of the invention;
FIG. 8 shows a flow diagram of a method 800 of data scrubbing in accordance with one embodiment of the present invention;
FIG. 9 shows a flow diagram of a data query method 900 according to one embodiment of the invention;
FIG. 10 shows a flow diagram of a method 1000 of interval acceleration according to one embodiment of the invention;
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 is a block diagram of a computing device 100 according to an exemplary embodiment of the present invention. The method 200 of indexing a data set and the data query method 800 according to the present invention may be performed in a computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more programs 122, and program data 124. In some implementations, the program 122 can be arranged to execute instructions on an operating system by one or more processors 104 using program data 124.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a server, e.g., file server, database server, application server, etc., which may be, for example, a Personal Digital Assistant (PDA), a wireless web-browsing device, an application-specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the computing device 100 is configured to perform a data indexing method 200. In addition, the computing device 100 also stores a data set in the form of an ordered singly linked list, and each element of the data set is respectively corresponding to different data nodes of the singly linked list.
FIG. 2 illustrates a flow diagram of a method 200 of indexing a data set, according to one embodiment of the invention. The method 200 is performed in a computing device, such as the computing device 100. The data set is stored in the computing device in the form of an ordered single linked list, and each element of the data set corresponds to different data nodes of the single linked list respectively.
As shown in fig. 2, the method begins at step S210, where a plurality of dummy nodes are inserted into the singly-linked list at step S210. In one implementation, the process of inserting the dummy node specifically includes:
firstly, the maximum value and the minimum value of data in a data set are obtained, and the range formed by the minimum value and the maximum value is determined as the value interval of the data set. Or acquiring the upper limit and the lower limit of the data set manually specified by the user according to the characteristics of the service data, and determining the upper limit and the lower limit as the value interval of the data in the data set. The service data characteristics are the characteristics of the data sets needing to be operated and the services, and are inseparable from the service properties of the data sets. The service data characteristics include that the value of the data is larger than 0 or smaller than a certain specific value. The user-specified upper limit and lower limit of the data set are the first virtual node value and the last virtual node value of the single linked list. The virtual node is a special node and plays a role in dividing the whole single linked list. Data is always stored in the virtual nodes and the data nodes, and the data has state attributes including valid and invalid.
And then, acquiring a virtual node distance of the first-layer data index set by a user, wherein the virtual node distance is an absolute value obtained by subtracting two adjacent virtual node values in the same series group index. And then, dividing the value interval of the determined data set into a plurality of sub-intervals with equal length according to the virtual node distance set by the user.
And after the division is finished, respectively taking the initial value of each subinterval as the node value of the virtual node, and inserting the initial value into the single linked list. And if the initial value of the subinterval is the node value of the data node in the singly linked list, taking the data node as a virtual node, and marking the state of the virtual node as valid. The virtual node divides the single chain table into a plurality of sub-chain tables, the virtual node is an initial node of the sub-chain tables, and the address of the virtual node is an initial address of the sub-chain tables in the divided single chain table. Each child chain table includes the smaller of the two endpoints, excluding the larger endpoint value; that is, in the two virtual nodes that partition the child chain table, the smaller virtual node value is stored as an element in the value range of the child chain table.
After inserting the dummy node, the method 200 proceeds to step S220. In step S220, a mapping is established between the virtual node value in the singly-linked list and the address of the virtual node in an array form, and an array index is established for the virtual nodes, so as to locate the sublink according to the array index, where a key of the array index is the node value of the virtual node and a value is the address of the virtual node. When data in the single linked list is queried, after target data to be queried is calculated, addresses of corresponding virtual nodes in the array index are found, and the target data is positioned, namely the initial address of the sublink to which the target data belongs is obtained.
Subsequently, the method 200 proceeds to step S230. In step S230, a hash index represented by a hash table is established for the data node, so as to locate the data node according to the hash index, where a key of the hash index is a node value of the data node, and a value is an address of the data node. The established hash table contains each node in the single linked list, and each element in the hash table stores a key and a value which are in one-to-one correspondence, wherein the key of the element is the value of each node in the single linked list, and the value of the element is the address of the corresponding node in the single linked list. Through the Hash index, the address of the corresponding node of the target data to be searched in the single linked list can be obtained. Each element in the hash table is generated as a new node of the singly linked list is inserted and the old node is deleted.
If the single linked list has more data in a sub-link list, and the traversal time is long when traversing to search data from the virtual node of the sub-link list, the sub-link list can be segmented again to be divided into a plurality of sub-link lists at the lower level, and the traversal is started from the virtual node of the sub-link list at the lower level. According to some embodiments of the invention, method 200 may further include building a lower array index above the array index built based on the singly linked list. FIG. 3 illustrates a flow diagram of a method 300 of establishing a lower array index according to one embodiment of the invention. As shown in fig. 3: in step S310, the first and last virtual node values of a predetermined child chain table are acquired. And acquiring a hierarchy coefficient specified by a user, wherein the hierarchy coefficient is the number num of the newly divided sub-chain tables contained in the new hierarchy and is only suitable for other hierarchy array indexes of the first-level array index not including the single chain table. The difference between the first and last virtual node values of the child chain table divided by the level coefficient is the virtual node distance dis of the lower child chain table.
Subsequently, step S320 is executed to allocate a memory space of the lower sub _ array in the memory, where the size of the memory space is the number num of the lower sub chain table. Subsequently, step S330 is executed to insert a plurality of virtual nodes into a predetermined child chain table, so as to divide the child chain table into a plurality of lower child chain tables; subsequently, step S340 is executed to establish a lower array index for a plurality of virtual nodes in the predetermined child chain table; subsequently, step S350 is executed to establish a mapping between child chain table inode values and node addresses in the hash table. Subsequently, step S360 is executed to update the address corresponding to the child chain table in the array index to the address of the lower array index, that is, modify the value of the first element of the present array to the address of the lower array sub _ array, and modify the number of elements included in the attribute to be the number num of the lower child chain table.
According to some embodiments of the present invention, the method 300 may further include comparing the node density of the predetermined child chain table in the single chain table with a set global node density threshold, and if the node density calculated by the predetermined child chain table is greater than the set global node density threshold, automatically creating a new level of data index for the predetermined child chain table. The node density threshold is set by the user, and the node density is the number of effective nodes in the sublink divided by the length of the sublink. If the sub-chain table further includes a next sub-chain table, the entire included next sub-chain table is regarded as a node to be calculated.
According to some embodiments of the invention, method 200 may further include inserting data in the singly linked list after the array index is established. FIG. 4 shows a flow diagram of a method 400 of inserting data according to one embodiment of the invention. As shown in fig. 4: in step S410, target data to be inserted is acquired. And determining whether the target data is in the single linked list or not through the hash index, specifically, querying whether a corresponding value exists in the hash table or not by using a key value key corresponding to the target data. And if the node exists in the single linked list, judging whether the node in the single linked list is effective, and if so, inserting the next target data. If the node status is invalid, step S415 is executed to modify the node status to be valid.
If the single linked list does not have the node, that is, the target data is not in the single linked list, step S420 is executed, the target data is divided by the value interval length corresponding to the sub-link list, and the value interval length is rounded down. Subsequently, step S430 is executed, the rounding-down result is multiplied by the length of the value-taking interval to obtain a node value of the target virtual node, and the node value can be written as: (data/value interval length) value interval length. And then determining a sub-chain table corresponding to the target data from the array index according to the node value of the target virtual node.
If the address of the target data virtual node cannot be determined in the present-level array index, the steps S420 and S430 are executed again, and the address of the target virtual node is determined from the lower-level array index. In step S420 and step S430, dividing the target data by the length of the value interval corresponding to the next-level child chain table, and rounding down; and multiplying the downward rounding result by the value interval length of the lower-level sublink table to obtain a node value of the target virtual node, continuously judging whether the address of the target virtual node can be determined after execution, and continuously returning to execute the steps S420 and S430 until the address of the target virtual node can be determined if the address of the target virtual node cannot be determined.
If the target data is an element in the value interval of the sublink, step S440 is executed, the number of nodes in the sublink where the element is located is added by 1, and step S450 is executed continuously, and a first node a larger than data is searched backwards from the virtual node address of the sublink. And step S460 is executed after the data is searched, a new singly linked list node N is distributed to the data, the data is inserted into the node A, step S470 is executed continuously, and mapping between the key corresponding to the data value and the singly linked list node N address is established in the hash table. And after the mapping is established, judging whether the node density of the sub-chain table reaches a global node density threshold value, and if the node density of the sub-chain table does not reach the global node density threshold value, acquiring the next data from the data set. If the global node density threshold is reached, step S480 is executed to divide the sublink and establish the next sublink. And after the completion, acquiring the next target data for insertion until the completion of the insertion of all the target data.
FIG. 5 is a diagram illustrating a singly linked list after nested segmentation and building a multi-level array index, according to some embodiments of the invention. The data set to be inserted contains the following data: 10,83,100,101,120,131,140,143,160,167,180,195,280. The lower limit and the upper limit of the single linked list manually specified by a user are 0 and 280, the hierarchy coefficient is 5, and the virtual node distance is 100. In fig. 5, the first-level array index virtual node distance is 100, and the second-level array index virtual node distance is 20. Firstly, dividing a single linked list into 3 first-level sub-linked lists according to the length of 100: [0 to 100), [100 to 200 ], and [200 to 300). Then, segmenting a second-level sublist and establishing an array index, and dividing a segment [ 100-200) into 5 segments according to the distance length of 20): [ 100-120), [ 120-140), [ 140-160), [ 160-180), [ 180-200). Wherein the nodes 10,83 belong to the segments [ 0-100 ], the node 101 belongs to the segments [ 100-120 ], and the segments [ 100-120) belong to the segments [ 100-200).
In fig. 5, the values of the virtual nodes of the first level child chain table are: 0. 100, 200, 300; the value of the virtual node of the second-level sub-chain table is as follows: 100. 120, 140, 160, 180. The states of the second-level sub-chain table virtual nodes are all valid, and the data of the first-level sub-chain table virtual nodes 0, 200 and 300 are invalid. The two arrays a and b in fig. 5 are the first-level segment array and the second-level segment array, respectively. In the array a, the 1 st element stores the address of the 1 st virtual node 0 in the virtual nodes of the first-level sublink table, the 3 rd element stores the address of the 3 rd virtual node 200, the 4 th element stores the address of the 4 th virtual node 300, and the 2 nd element in the array a stores the first address of the second-level sublink table; in array b, the 1 st element stores the address of the 1 st virtual node 100 in the second level child chain table virtual nodes, and the 2 nd element stores the address of the 2 nd virtual node 120. In the singly linked list, the start address of the segment [ 200-300) where the data 280 is located, i.e., the address of the virtual node 200, can be obtained by an array index. And the node density of the first-level 3 sub-linked lists is respectively 0.02, 0.02 and 0.01.
According to some embodiments of the invention, the method 200 may further include performing data maintenance on the singly linked list, including deleting data, after completing the establishment of the singly linked list data index and the hash index. FIG. 6 illustrates a flow diagram of a method 600 for data maintenance by deleting data, according to an embodiment of the invention. As shown in fig. 6: step S610 is first executed to acquire the value data to be deleted. And determining whether the value data to be deleted is in the single linked list or not through the hash index, and if the value data to be deleted is not in the single linked list, ending the operation. If the value data is to be deleted in the single-linked list, step S620 is executed, the address of the node corresponding to the data in the single-linked list is obtained through the hash table, and step S630 is continuously executed, so that the state of the data node corresponding to the value data to be deleted is changed to be invalid. And after the modification is finished, executing step S640, subtracting 1 from the node number of the sublink where the data is located, and deleting the hash index corresponding to the data node.
According to some embodiments of the invention, the data maintenance of method 600 may also include data updates. FIG. 7 shows a flow diagram of a method 700 of data updating according to one embodiment of the invention. Updating data is a compound operation that includes deleting data to be updated and inserting updated data. As shown in fig. 7, step S710 is first executed to acquire data to be updated and updated data. Step S720 is then executed to delete the data to be updated from the single linked list, and the specific steps refer to the data deletion method 600 according to an embodiment of the present invention. Step S730 is then executed to insert the updated data into the single linked list, and the specific steps refer to the method 400 for inserting data according to an embodiment of the present invention.
According to some embodiments of the invention, embodiment data maintenance of method 600 may also include data scrubbing. FIG. 8 shows a flow diagram of a method 800 of data scrubbing in accordance with one embodiment of the present invention. As shown in fig. 8, step S810 is executed first, and after all data nodes in the single-chain table are traversed backward from the head node of the single-chain table, and a data node is traversed, it is determined whether the node state of the data node is invalid. If the node status of the data node is invalid, step S820 is executed to delete the current data node from the singly linked list. And after the deletion is finished, returning and continuously judging the state of the next node until all the data nodes are traversed.
After the index is built for the data set, a data query can be performed based on the built index. FIG. 9 shows a flow diagram of a method 900 of data querying, according to an embodiment of the invention. Method 900 is performed in a computing device, such as computing device 100. The data query method 900 is to perform data query through the array index and the hash index after the array index and the hash index are established by establishing the index method 200 for the data set. The data query includes a single point query and an interval query.
As shown in fig. 9: firstly, whether the user carries out single-point query or interval query is judged. If the user is judged to carry out the single-point query, executing step S910, obtaining the target data to be queried, and determining whether the target data is in the single-linked list or not through the Hash index; and if the single linked list does not have the target data, finishing the query. And if the target data is in the single linked list, executing step S920, acquiring the address of the corresponding node of the data in the single linked list through a hash table, and judging whether the node state is valid. If the node state is invalid, the query is ended, and if the node state is valid, the step S930 is executed, and the node address of the target data is returned.
If the user is judged to execute the interval query, step S940 is executed to acquire a target data interval to be queried, acquire an initial value a and an end value b in the target data interval, and determine whether the initial value is in the single linked list through the hash index. If the initial value a is in the singly linked list, the subsequent step S965 is directly performed, and the address of the node corresponding to the initial value a in the singly linked list is obtained through the hash table and then the step S970 is continuously performed. And if the initial value a is not in the single linked list, determining the sub-linked list corresponding to the initial value a through the array index. Step S950 is executed to divide the start value a by the value interval length corresponding to the sub-chain table, and then rounding down. Subsequently, step S960 is executed to multiply the rounding-down result by the value interval length to obtain a node value of the target virtual node, where the node value may be written as: (a/span length) span length.
And then determining a sub-chain table corresponding to the initial value a from the array index according to the node value of the target virtual node. If the address of the lower array index is determined from the array index, the process returns to step S950 and step S960, and the lower child chain table corresponding to the start value a is determined from the lower array index. In step S950 and step S960, dividing the starting value a by the length of the value interval corresponding to the next-level child chain table, and rounding down; and multiplying the downward rounding result by the value interval length of the lower sublink table to obtain a node value of the target virtual node, continuously judging whether the address of the target virtual node can be determined after execution, and continuously returning to execute the steps S950 and S960 until the address of the target virtual node can be determined if the address of the target virtual node cannot be determined. Step S970 is then executed to obtain a query result corresponding to the target data interval. And traversing the single linked list from the data node corresponding to the initial value a or the initial node of the determined sub-chain table. Searching the nodes which are larger than or equal to a, and extracting the addresses of the nodes with the effective states to the result set until the traversed current node value is not smaller than the end value b any more.
According to some embodiments of the present invention, the method 900 for data query may further include performing interval acceleration on a segmented array if there are more elements in the segmented array and the speed is slower during interval traversal search. FIG. 10 shows a flow diagram of a method 1000 of interval acceleration according to one embodiment of the invention. As shown in fig. 10: step S1010 is executed first to obtain the section to be accelerated by the multiple N. The acceleration factor is specified by the user and is greater than 1, and when the user is not specified, the default is 1. Subsequently, step S1020 is executed to determine a local density threshold value in the acceleration interval. The local density threshold is a user-specified global density threshold divided by an acceleration factor N. Subsequently, step S1030 is executed, the singly linked lists in the interval are nested and segmented, all data nodes in the acceleration interval are traversed, and whether the node density is greater than the local node density threshold is calculated. And if the node density is greater than the local node density threshold, dividing the sublist in which the data node is located into a plurality of subordinate sublists. After the division is completed, step S1040 is executed, and a newly generated mapping between the sub-layer virtual node and the node address is established in the hash table.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the various methods of the present invention in accordance with instructions in the program code stored in the memory.
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The present invention may further comprise: the method A5, according to the A4, wherein the predetermined child link table is a child link table, of the plurality of child link tables, of which the node density is greater than a global node density threshold; and the node density is the number of effective nodes in the sublink divided by the length of the sublink. A6, the method as in A1, further comprising inserting data in the singly linked list according to the following steps: acquiring target data to be inserted; determining whether the target data is in the singly linked list through the hash index; if the target data is not in the single linked list, determining a sub-linked list corresponding to the target data through the array index; and traversing the singly linked list from the starting node of the sublink list, searching a first data node which is larger than the target data, and inserting the target data before the data node. The method of A7, as described in A1, further includes performing data maintenance on the singly linked list, where the data maintenance includes deleting data, and the deleting data includes: determining whether a value to be deleted is in the singly linked list or not through the hash index; and if the value to be deleted is in the single linked list, modifying the state of the data node corresponding to the value to be deleted into invalid. The method of A8, A7, wherein the data maintenance further comprises data updating, the data updating comprising: deleting the data to be updated from the single linked list; and inserting the updated data into the singly linked list. The method of A9, A7, wherein the data maintenance further comprises data cleaning, and the data cleaning comprises: traversing all data nodes in the singly linked list; judging whether each traversed data node is effective or not; and if the data node is invalid, deleting the data node from the singly linked list.
Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Additionally, some of the embodiments are described herein as a method or combination of method elements that can be implemented by a processor of a computer system or by other means of performing the described functions. A processor with the necessary instructions for carrying out the method or the method elements thus forms a device for carrying out the method or the method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (14)

1. A method of indexing a data set, performed in a computing device in which the data set is stored in the form of an ordered singly linked list, elements of the data set corresponding to different data nodes of the singly linked list, respectively, the method comprising:
inserting a plurality of virtual nodes into the single linked list to divide the single linked list into a plurality of sub-linked lists, wherein the virtual nodes are initial nodes of the sub-linked lists;
establishing array indexes for the virtual nodes so as to locate the child chain table according to the array indexes, wherein keys of the array indexes are node values of the virtual nodes, and values are addresses of the virtual nodes;
establishing a hash index for the data node so as to locate the data node according to the hash index, wherein a key of the hash index is a node value of the data node, and a value is an address of the data node;
the method further comprises the step of maintaining the data of the single linked list, wherein the data maintenance comprises data cleaning, and the data cleaning comprises the following steps: and traversing all data nodes in the singly linked list, judging whether the data node is effective or not for each traversed data node, and if the data node is ineffective, deleting the data node from the singly linked list.
2. The method of claim 1, wherein the inserting a plurality of dummy nodes in the singly linked list to divide the singly linked list into a plurality of sublinks comprises:
acquiring a value interval of data in the data set;
dividing the value interval into a plurality of subintervals with equal length;
and respectively taking the initial value of each subinterval as the node value of the virtual node, and inserting the initial value into the singly linked list.
3. The method of claim 2, wherein if the starting value of the subinterval is a node value of a data node in the singly linked list, the data node is treated as a dummy node, and the state of the dummy node is marked as valid.
4. The method of any of claims 1 to 3, further comprising:
inserting a plurality of virtual nodes into a predetermined child chain table to divide the child chain table into a plurality of subordinate child chain tables;
establishing a lower array index for a plurality of virtual nodes in the predetermined child chain table;
and updating the address corresponding to the sub-chain table in the array index into the address of the next-level array index.
5. The method of claim 4, wherein the predetermined child chain table is a child chain table of the plurality of child chain tables having a node density greater than a global node density threshold;
and the node density is the number of effective nodes in the sublink divided by the length of the sublink.
6. The method of claim 1, further comprising inserting data in the singly linked list by:
acquiring target data to be inserted;
determining whether the target data is in the singly linked list through the hash index;
if the target data is not in the single linked list, determining a sub-linked list corresponding to the target data through the array index;
and traversing the singly linked list from the starting node of the sublink list, searching a first data node which is larger than the target data, and inserting the target data before the data node.
7. The method of claim 1, wherein the data maintenance further comprises deleting data, the deleting data comprising:
determining whether a value to be deleted is in the singly linked list or not through the hash index;
and if the value to be deleted is in the single linked list, modifying the state of the data node corresponding to the value to be deleted into invalid.
8. The method of claim 7, wherein the data maintenance further comprises a data update, the data update comprising:
deleting the data to be updated from the single linked list;
and inserting the updated data into the singly linked list.
9. A data query method with array index and hash index established by the method of any one of claims 1-8, executed in a computing device, wherein data sets are stored in the computing device in the form of an ordered singly-linked list, each element of the data sets respectively corresponds to a different data node of the singly-linked list, and the data sets are established with the array index and the hash index, the data query method comprising:
acquiring target data to be inquired;
determining whether the target data is in the singly linked list through the hash index;
and if the target data is in the single linked list, returning the node address of the target data.
10. The data query method of claim 9, further comprising:
acquiring a target data interval to be inquired;
determining whether the initial value of the target data interval is in the single linked list or not through the hash index;
if the initial value is in the single linked list, traversing the single linked list from the data node corresponding to the initial value to acquire a query result corresponding to the target data interval;
and if the initial value is not in the single linked list, determining a sub-linked list corresponding to the initial value through the array index, traversing the single linked list from the initial node of the sub-linked list, and acquiring a query result corresponding to the target data interval.
11. The data query method of claim 10, wherein the determining the sublist to which the start value corresponds through the array index comprises:
dividing the initial value by the length of the value interval corresponding to the sublink, and rounding downwards;
multiplying the downward rounding result by the length of the value interval to obtain a node value of the target virtual node;
and determining a sub-chain table corresponding to the initial value from the array index according to the node value of the target virtual node.
12. The data query method of claim 11, when an address of a lower array index is determined from the array index according to a node value of a target virtual node, the data query method further comprising:
determining a lower sublink table corresponding to the initial value through the lower array index;
and traversing the single chain table from the initial node of the subordinate sublink table to obtain a query result corresponding to the target data interval.
13. A computing device, comprising:
at least one processor; and
at least one memory including computer program instructions;
the at least one memory and the computer program instructions are configured to, with the at least one processor, cause the computing device to perform the method of any of claims 1-12.
14. A readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method of any of claims 1-12.
CN201911144368.4A 2019-11-20 2019-11-20 Method for constructing index for data set, data query method and computing equipment Active CN110929103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911144368.4A CN110929103B (en) 2019-11-20 2019-11-20 Method for constructing index for data set, data query method and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911144368.4A CN110929103B (en) 2019-11-20 2019-11-20 Method for constructing index for data set, data query method and computing equipment

Publications (2)

Publication Number Publication Date
CN110929103A CN110929103A (en) 2020-03-27
CN110929103B true CN110929103B (en) 2023-04-11

Family

ID=69851400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911144368.4A Active CN110929103B (en) 2019-11-20 2019-11-20 Method for constructing index for data set, data query method and computing equipment

Country Status (1)

Country Link
CN (1) CN110929103B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111355580B (en) * 2020-05-25 2020-09-11 腾讯科技(深圳)有限公司 Data interaction method and device based on Internet of things
CN112632087B (en) * 2020-05-27 2022-10-14 北京大学 Ordered linked list quick query method and device based on range diagram
CN111858586B (en) * 2020-07-06 2024-04-09 北京天空卫士网络安全技术有限公司 Data processing method and device
CN112116951B (en) * 2020-08-14 2023-04-07 中国科学院计算技术研究所 Proteome data management method, medium and equipment based on graph database
CN114385628A (en) * 2021-12-15 2022-04-22 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114756591B (en) * 2022-04-15 2022-10-14 成都卓讯智安科技有限公司 Data screening method and system based on bidirectional linked list
CN115658730B (en) * 2022-09-20 2024-02-13 中国科学院自动化研究所 Sparse data query method, apparatus, device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122285A (en) * 2010-01-11 2011-07-13 卓望数码技术(深圳)有限公司 Data cache system and data inquiry method
US20140071988A1 (en) * 2012-09-11 2014-03-13 Cisco Technology, Inc. Compressing Singly Linked Lists Sharing Common Nodes for Multi-Destination Group Expansion
CN110046160A (en) * 2019-03-15 2019-07-23 中国科学院计算技术研究所 A kind of consistency Hash storage system construction method based on band

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122285A (en) * 2010-01-11 2011-07-13 卓望数码技术(深圳)有限公司 Data cache system and data inquiry method
US20140071988A1 (en) * 2012-09-11 2014-03-13 Cisco Technology, Inc. Compressing Singly Linked Lists Sharing Common Nodes for Multi-Destination Group Expansion
CN110046160A (en) * 2019-03-15 2019-07-23 中国科学院计算技术研究所 A kind of consistency Hash storage system construction method based on band

Also Published As

Publication number Publication date
CN110929103A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929103B (en) Method for constructing index for data set, data query method and computing equipment
CN108038183B (en) Structured entity recording method, device, server and storage medium
WO2010048789A1 (en) Index building, querying method, device, and system for distributed column memory database
JP6608972B2 (en) Method, device, server, and storage medium for searching for group based on social network
CN113297138A (en) Index establishing method, data query method and computing device
US20100228914A1 (en) Data caching system and method for implementing large capacity cache
CN108228799B (en) Object index information storage method and device
US10191998B1 (en) Methods of data reduction for parallel breadth-first search over graphs of connected data elements
WO2016070751A1 (en) Distributed cache range querying method, device, and system
CN111651641B (en) Graph query method, device and storage medium
CN112667636B (en) Index establishing method, device and storage medium
CN111666468A (en) Method for searching personalized influence community in social network based on cluster attributes
CN111177578B (en) Search method for most influential community around user
CN102187642A (en) Method and device for adding, searching for and deleting key in hash table
WO2016177027A1 (en) Batch data query method and device
Firth et al. TAPER: query-aware, partition-enhancement for large, heterogenous graphs
CN112269784A (en) Hash table structure based on hardware realization and inserting, inquiring and deleting method
CN111177190B (en) Data processing method, device, electronic equipment and readable storage medium
WO2021012211A1 (en) Method and apparatus for establishing index for data
CN107688620B (en) Top-k query-oriented method for instantly diversifying query results
CN112948591A (en) Subgraph matching method and system suitable for directed graph and electronic device
CN112417179A (en) Address processing method and device
CN112307272A (en) Method and device for determining relation information between objects, computing equipment and storage medium
CN113076330A (en) Query processing method and device, database system, electronic equipment and storage medium
CN114356977B (en) Distributed RDF graph query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant