CN110334094B - Data query method, system, device and equipment based on inverted index - Google Patents

Data query method, system, device and equipment based on inverted index Download PDF

Info

Publication number
CN110334094B
CN110334094B CN201910537024.3A CN201910537024A CN110334094B CN 110334094 B CN110334094 B CN 110334094B CN 201910537024 A CN201910537024 A CN 201910537024A CN 110334094 B CN110334094 B CN 110334094B
Authority
CN
China
Prior art keywords
data
block
node
position information
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910537024.3A
Other languages
Chinese (zh)
Other versions
CN110334094A (en
Inventor
杨新颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910537024.3A priority Critical patent/CN110334094B/en
Publication of CN110334094A publication Critical patent/CN110334094A/en
Application granted granted Critical
Publication of CN110334094B publication Critical patent/CN110334094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data query method, system, device and equipment based on inverted index are disclosed. By pre-establishing the inverted index containing the corresponding relation between the service attribute and the position information, the block height of the data block where the corresponding data record is located can be determined based on the service attribute contained in the query instruction, further, the data node stored in the data block is determined, the query instruction is respectively forwarded to each data node, multi-process query is performed, and the result is returned for summarization.

Description

Data query method, system, device and equipment based on inverted index
Technical Field
Embodiments of the present disclosure relate to the field of information technologies, and in particular, to a method, a system, a device, and an apparatus for querying data based on inverted index.
Background
In a block chained ledger, data blocks grow rapidly, and meanwhile data is always stored in a certain node device in a central mode, which puts high requirements on the storage capacity of the node device, and on the other hand, when the number of stored data blocks is too large, it is time-consuming to query the data blocks by using a single device.
Based on this, there is a need for a more efficient query scheme in a block chained ledger.
Disclosure of Invention
An object of an embodiment of the present application is to provide a more efficient query scheme in a block chained ledger.
In order to solve the technical problems, the embodiment of the application is realized as follows:
the data query method based on the inverted index is applied to a database system of a centralized storage block chain type account book, wherein the database system comprises a coordination node and a plurality of data nodes, and the method comprises the following steps:
the coordination node receives a query instruction containing service attributes;
the coordination node obtains the position information of the data record corresponding to the service attribute based on the pre-established inverted index query, wherein the inverted index contains the corresponding relation of the service attribute and the position information of the data record, and the position information comprises the block height of the data block where the data record is located and the offset in the data block where the data record is located;
the coordination node determines a corresponding data node according to the block height and forwards the position information and the query instruction to the determined data node;
any data node receiving the query instruction queries to obtain a corresponding data record according to the position information, and returns the data record obtained by query to the coordination node;
The coordination node gathers the data records returned by all the data nodes, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
A data query system based on inverted index, which is applied to a database system of a centralized storage block chain ledger, wherein the database system comprises a coordination node and a plurality of data nodes, in the system,
the coordination node receives a query instruction containing service attributes, and obtains position information of data records corresponding to the service attributes based on pre-established inverted index query, wherein the inverted index contains the corresponding relation of the service attributes and the position information of the data records, and the position information comprises the block height of the data block where the data records are located and the offset in the data block where the data records are located; determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
any data node receiving the query instruction queries to obtain a corresponding data record according to the position information, and returns the data record obtained by query to the coordination node;
The coordination node gathers the data records returned by all the data nodes, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
On the other hand, the embodiment of the specification also provides a data query method based on inverted index, which is applied to a coordination node in a database system of a centralized storage block chain ledger, and the method comprises the following steps:
receiving a query instruction containing service attributes;
acquiring position information of a data record corresponding to the service attribute based on a pre-established inverted index query, wherein the inverted index comprises a corresponding relation of the service attribute and the position information of the data record, and the position information comprises a block height of a data block where the data record is located and an offset in the data block where the data record is located;
determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
receiving data records returned by each data node, and summarizing to generate a set of data records corresponding to the service attribute;
and sending the set of the data records to a query instruction initiator.
Corresponding to another aspect, an embodiment of the present disclosure further provides a data query device based on inverted index, applied to a coordination node in a database system of a centralized storage block chain ledger, where the device includes:
The receiving module receives a query instruction containing service attributes;
the position inquiry module is used for inquiring and acquiring the position information of the data record corresponding to the service attribute based on a pre-established inverted index, wherein the inverted index comprises the corresponding relation of the service attribute and the position information of the data record, and the position information comprises the block height of the data block where the data record is located and the offset in the data block where the data record is located;
the data node determining module is used for determining a corresponding data node according to the block height and forwarding the position information and the query instruction to the determined data node;
the summarizing module is used for receiving the data records returned by each data node and summarizing to generate a set of data records corresponding to the service attribute;
and the sending module is used for sending the set of the data records to the inquiry instruction initiator.
In the scheme provided by the embodiment of the application, the inverted index containing the corresponding relation between the service attribute and the position information is pre-established, so that the block height of the data block where the corresponding data record is located can be determined based on the service attribute contained in the query instruction, further, the data node stored in the data block is determined, the query instruction is respectively forwarded to each data node, multi-process query is performed, the result is returned for summarization, and a more efficient query scheme is realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the disclosure.
Further, not all of the effects described above need be achieved in any of the embodiments of the present specification.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present description, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a schematic flow chart of a block chain ledger generation provided in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a block header according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for storing data applied to a database system according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a method for creating an inverted index of a data record according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a data query method based on inverted index according to an embodiment of the present disclosure;
Fig. 6 is a flowchart of a data query method based on inverted index applied to a coordination node according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data query device based on inverted index according to an embodiment of the present disclosure;
fig. 8 is a schematic structural view of an apparatus for configuring the method of the embodiment of the present specification.
Detailed Description
In order for those skilled in the art to better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification shall fall within the scope of protection.
First, a description will be given of a block chain ledger under centralization in the embodiment of the present specification. At a centralized database service provider, a block chained ledger is generated by, as shown in fig. 1, fig. 1 is a schematic flow chart of generating a block chained ledger according to an embodiment of the present disclosure, including:
S101, receiving data records to be stored, which contain specified identification fields, and determining hash values of the data records, wherein the specified identification fields are used for identifying service attributes of the data records.
The data record to be stored can be various consumption records of individual users of the client, or can be business results, intermediate states, operation records and the like generated by the application server when executing business logic based on instructions of the users. Specific business scenarios may include consumption records, audit logs, supply chains, government regulatory records, medical records, and the like.
In each of the institutions interfacing with the database server, the business attributes are generally unique in the interfacing institutions, and based on different business scenarios, the business attributes may include a user name, a user identification number, a driver license number, a mobile phone number, a project unique number, and so on.
For example, for a third party payment mechanism, the data record is a consumption record of the user, and the service attribute at this time is a user identifier (including a mobile phone number, an identity card number, a user name, etc.), or a hash value obtained by performing a hash algorithm on the user identifier; alternatively, for government agencies, where the data records are overhead flows for multiple public items, the business attributes at that time may be unique numbers for each item.
The specific location of the specified identification field and the acquisition mode may be a database server and docking mechanism negotiated in advance. For example, when the data record provided by the docking mechanism is a standard structured data record, the specified identification field may be obtained from a specified offset in the data record, or the starting and ending positions may be identified by specific characters; or when the data records provided by the docking mechanism are unstructured data, the head containing the service attribute can be spliced at the beginning of each data record directly when the docking mechanism uploads the data records, and the database server can obtain the appointed identification field of each data record from the head directly.
And S103, when a preset blocking condition is met, determining each data record in the data block to be written, and generating an N data block containing the hash value of the data block and the data record.
The preset blocking conditions include: the number of data records to be stored reaches a number threshold, for example, each time one thousand data records are received, a new data block is generated, and one thousand data records are written into the block; alternatively, the time interval from the last block forming time reaches a time threshold, e.g., every 5 minutes, a new data block is generated and the data records received within the 5 minutes are written into the block.
Here, N refers to the sequence number of the data block, in other words, in the embodiment of the present specification, the data blocks are in the form of a block chain, and are arranged in sequence based on the sequence of the block forming time, which has a strong timing characteristic. Wherein the block heights of the data blocks monotonically increase based on the order of the block times. The block height may be a sequence number, and at this time, the block height of the nth data block is N; the block heights may also be generated in other ways, such as large integer data (typically monotonically increasing 12 to 15 bit integer data) based on symmetric encryption of the block time stamps of the data blocks, e.g., a large integer of 13 bits. Since the large integer is based on time symmetric encryption, when the block time of the data block is needed, the block time can be obtained by the same symmetric decryption.
For example, for the chunk time "20xx-01-19 03:14:07.938576", after symmetric encryption, it may be converted to a large chunk "1547838847938", where "1547838847938" may be used as the chunk height of the data chunk to identify the data chunk, since chunk data monotonically increases over time.
In the present specification, the block heights are monotonically increasing based on the block formation time, so that even if large-sized data is employed, the order among the data blocks is reflected from small to large. For example, if the next chunk time of a data chunk is "20xx-01-19 03:16:07.235125," it can be converted to another larger large integer "1547838848125" using a preset symmetric encryption algorithm.
When n=1, that is, the data block at this time is the initial data block. The hash value and block height of the initial data block are given based on a preset manner. For example, the initial data block does not include a data record, the hash value is any given hash value, and the block height blknum=0; for another example, the generation trigger condition of the initial data block is identical to the trigger condition of the other data blocks, but the hash value of the initial data block is determined by hashing all the contents in the initial data block.
When N >1, since the content and hash value of the previous data block have been determined, at this time, the hash value of the current data block (nth data block) may be generated based on the hash value of the previous data block (i.e., nth-1 data block), for example, in a feasible manner, determining the hash value of each data record to be written into the nth block, generating a merck tree according to the arrangement order in the block, splicing the root hash value of the merck tree and the hash value of the previous data block together, and generating the hash value of the current block again by adopting the hash algorithm. For example, the hash value of the whole data record may be obtained by splicing the sequence of the data records in the block, splicing the hash value of the previous data block and the hash value of the whole data record, and performing hash operation on the string obtained by splicing to generate the hash value of the data block.
In each data block, it contains a block header for storing metadata, and a block for storing data records. The block header in the data block may be used to store, for example, a parent hash, its own block hash value, version number, the root hash of the data record, a timestamp, and so on. As shown in fig. 2, fig. 2 is a schematic diagram of a chunk header according to the embodiment of the present disclosure, and of course, the format of the chunk header may be customized based on service requirements, and may further include some other information, for example, a state array for describing the state of the data record, and the like, and the chunk is used for storing plaintext of the data record or hash value of the data record.
By the foregoing generation method of the data blocks, each data block is determined by a hash value, and the hash value of the data block is determined by the content and sequence of the data records in the data block and the hash value of the previous data block. The user can initiate verification based on the hash value of the data block at any time, and the modification of any content in the data block (including modification of the content or sequence of the data record in the data block) can cause inconsistency between the hash value of the data block calculated during verification and the hash value generated during data block generation, so that verification failure is caused, and therefore, the tamper-proof effect under centralization is realized.
It should be noted that, the generation of the data block may be implemented in a coordination node in the database system, or may not be implemented in a coordination node. For example, the data system may further include other service nodes, which are dedicated to processing the generation of data blocks to implement service decoupling with respect to storage, where each generated data block is sent by the service node to the coordinator node for storage.
After a data block is acquired by a coordinator node in the database system, the data block needs to be stored. In the embodiment of the present disclosure, the storage manner adopted is to store a plurality of data blocks in the same ledger in a plurality of data nodes in a scattered manner, so as to adapt to rapid growth of the block chain ledger and reduce storage pressure of a single storage device. As shown in fig. 3, fig. 3 is a schematic flow chart of a data storage method applied to a database system according to an embodiment of the present disclosure, where the flow specifically includes the following steps:
s301, a coordination node obtains a generated data block, determines a data node corresponding to the data block according to a block hash value of the data block, distributes the data block to a corresponding data node, establishes routing information of the data block and the data node, and stores the routing information and block header information of the data block.
In a database system, there are typically a plurality of data nodes. For this purpose, the coordinator node first needs to determine to which data node a data block should be allocated. Specifically, the allocation may be based on hash values of the data blocks.
As previously described, the hash value of a data block may be calculated based on a combination of the parent hash and the hash of its own data record and stored in the block header. The hash value (hash value) is a value calculated using a hash function (hash function), and the supported algorithm includes: MACTripleDES, MD5, RIPEMD160, SHA1, SHA256, SHA384, SHA512, etc., in summary, the chunk hash value of a chunk is a short string that uniquely identifies the chunk, and slight modifications to any content in the chunk can cause large variations in the chunk hash value.
While the number of data nodes is generally fixed, each data node may have a corresponding number. Therefore, the hash value can be converted into a corresponding numerical value, and the modulo calculation is performed on the number of the data nodes, so that the data nodes corresponding to the data blocks can be determined according to the modulo result.
For example, after the number conversion of the block hash value of a data block is 100110120, the number of the data nodes is 10, and the numbers are respectively from 0 to 9, if the result of the block hash value on the modulo result is 0, the number 0 tree node can be determined to be the data node corresponding to the block hash value, and the data block can be sent to the number 0 data node for storage.
Since the block hash value of a data block generally has several hundred bits (the number of bits is determined based on the hash algorithm), a specified number of bits (for example, the last 3 bits) may be selected from the block hash value to perform numerical conversion, so as to perform modulo operation to determine the data node corresponding to the data block, thereby reducing the calculation amount.
As another example, all data nodes may also be arranged on a hash ring that is end-to-end, e.g., a hash ring from 0 to 2≡32 in size. Each data node may be located to a point on the hash ring according to its address or a hash value corresponding to the device identifier. Each chunk hash value may be located to a position on the hash ring based on the same principle, so that the data node that is encountered first may be found to be the data node to which the chunk hash value corresponds in a clockwise or counterclockwise time.
After determining the data node corresponding to a data block, a piece of routing information about the data block can be established and written into the routing table in the coordination node. Specifically, a routing table may include information such as a data block height, a block hash of the data block, a data node number corresponding to the data block, and the like, and be stored locally. As shown in table 1, table 1 is an exemplary routing table provided by embodiments of the present description.
TABLE 1
Data block height Block hashing Data node numbering
1 Hash1 1
2 Hash2 2
300 Hash300 1
…… …… ……
In addition, in addition to the routing information, the coordinator node should also store the block header information of each data block.
And S303, the data node receives and stores the data block sent by the coordination node.
Through the scheme, the block chained account book is stored in a distributed mode according to the granularity of the data blocks, and metadata such as block header information and the like are stored in the coordination node, so that the storage pressure of single node equipment can be reduced, and the system is more convenient.
Meanwhile, because each data record includes a service attribute, based on this, the embodiment of the present disclosure provides a method for creating an index of a data record, which is applied to a coordination node, as shown in fig. 4, fig. 4 is a schematic flow chart of a method for creating an inverted index of a data record, where the flow chart specifically includes the following steps:
s401, acquiring a specified identification field in a data record, wherein the specified identification field is used for identifying the service attribute of the data record.
The specific location and the acquisition manner of the specified identification field are already described above, and will not be described here again.
S403, determining position information of the data record in the account book, wherein the position information comprises block height of a data block where the data record is located and offset in the data block where the data record is located.
As previously described, a block-chained ledger is composed of a plurality of data blocks, and a data block typically contains a plurality of data records. Thus, in the embodiment of the present specification, the location information specifically refers to which data block in the ledger is located when a data record is saved, and where in the data block.
In the data blocks provided in the embodiments of the present description, there are various ways to identify different data blocks, including hash values or block heights of the data blocks.
The hash value of a data block is a hash value obtained by performing hash calculation according to the hash value of the previous block and the data record of the data block, and can be used for uniquely and definitely identifying one data block. In a block chained ledger, the block height of the first data block is usually 0, and then 1 is added to the block height of each data block; alternatively, the block time of a data block may be converted into a large monotonically increasing sequence of integer data (typically 12 to 15 bits) as the block height of the data block. Thus, a data block is typically one distinct block high.
For example, in a case of a determined data block to be written into the database, in which the ordering of the data records is also fixed, so that the sequence number of a data record in the data block is also clear, when the length of the data record is a fixed unit, the sequence number can also be used to clear the location information of the data record in the data block in which it is located. That is, the sequence number may also be used to indicate the offset.
Meanwhile, in one data block, since a plurality of data records are generally included, the address offset of each data record in the data block may also be used to identify the data records in the data block, respectively. It is apparent that the address offsets of the data records are not the same in the same data block.
Of course, since the specific format of the data block is customizable (e.g., metadata information and remark information contained in the block header of the data block, the form taken by the block height of the data block, etc.) in the manner provided in the embodiments of the present disclosure, the content of the location information may also be different in different formats, which does not constitute a limitation of the present disclosure.
S405, establishing a corresponding relation between the specified identification field and the position information, and writing an index taking the specified identification field as a main key.
That is, the index is an inverted index. In the index, the primary key is a business attribute contained in the data record. The specific writing mode is that when the main key in the index does not contain the appointed identification field, an index record taking the appointed identification field as the main key is created in an index table.
And when the primary key in the index contains the appointed identification field, writing the position information into an index record where the appointed identification field is located. The writing here is not an overlay writing, but a writing in which position information is added to the value of the index record and is juxtaposed with other position information in the index record.
As shown in table 2, table 2 is an exemplary index table provided in the embodiments of the present specification. Wherein Key is a specific Value (for example, may be a user name) of the service attribute, each array of the Value part is a piece of position information, the front part in each array is high, the rear part is a serial number of the data record in the data block, and a data record can be uniquely determined through the block height and the serial number. It is easily understood that one key may correspond to a plurality of position information in the index table.
TABLE 2
Key Value
0X123456 (2,08),(2,10),(300,89),(300,999)
344X0001 (5,01),(8,22)
…… ……
The reverse index table is also stored in the coordinator node. By the above scheme, a routing table about block height and data node, and an inverted index table of service attribute and data record position information (including block height) can be obtained in the coordination node.
Based on the foregoing solutions, the embodiments of the present disclosure further provide a data query method based on inverted index, which is applied to a database system of a centralized storage block chain ledger, where the database system includes a coordination node and a plurality of data nodes, as shown in fig. 5, and fig. 5 is a flow chart of the data query method based on inverted index provided by the embodiments of the present disclosure, where the method includes:
S501, the coordination node receives a query instruction containing service attributes.
The query request may be from a docking mechanism with the database system or may be from a service user of the docking mechanism. Thus, the database system can match from the index table according to the specific value of the business attribute. For example, the user inputs a query command, retrieve (0X 123456, & v, FULL), characterizing all data records corresponding to the required query service attribute "0X 123456".
S503, the coordination node obtains the position information of the data record corresponding to the service attribute based on the pre-established inverted index query.
The coordinating node can obtain the position information (2,08), (2, 10), (300,89), (300,999) of the data record corresponding to the user '0X 123456' from the index table 2 at this time, and further obtain the corresponding data record according to the position information query.
And, the query instruction may further determine a segment of data block from two data block heights, for example, the user inputs the query instruction, and the query instruction is used for querying the data records of the specified user 0X123456 between 200 and 1000 blocks in the ledger book, so as to obtain the data records corresponding to the location information (300,89), (300,999).
And S505, the coordination node determines the corresponding data node according to the block height and forwards the position information and the query instruction to the determined data node.
After determining the location information corresponding to the service attribute, the coordinator node may determine, according to the routing table 1, a data node corresponding to the block height, for example, in the routing table 1, the block height 2 corresponds to the data node 2.
When the coordination node forwards the query instruction, one way is to forward all the position information to the data nodes, and each data node screens the position information. For example, the coordinating node forwards the location information (2,08), (2, 10), (300,89), (300,999) of the data record corresponding to "0X123456" to the data node 2, and since the data block stored in the data node includes a block header and a block body, the data node can perform a local query according to the block height in the location information, so as to remove the location information of the data node that is not located locally.
In another forwarding mode, before forwarding, the coordinating node performs classification first to determine part of position information to be forwarded to a data node. For example, as can be seen from table 1, since data block 2 is stored by data node 2 and data block 300 is stored by data node 1, the coordinator node determines to send location information (2,08), (2, 10) to data node 2 and, at the same time, send location information (300,89), (300,999) to data node 1, and at this time, each data node can directly query the data record corresponding to the obtained location information.
S507, any data node receiving the query instruction queries according to the position information to obtain a corresponding data record, and returns the data record obtained by query to the coordination node.
Based on the foregoing example, the data node 1 and the data node 2 will respectively query locally to improve query efficiency.
In practical application, the number of data blocks is higher, the data records are stored more dispersedly, and the query efficiency can be greatly improved by querying a plurality of data nodes.
S509, the coordination node gathers the data records returned by each data node, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
In the scheme provided by the embodiment of the application, the inverted index containing the corresponding relation between the service attribute and the position information is pre-established, so that the block height of the data block where the corresponding data record is located can be determined based on the service attribute contained in the query instruction, further, the data node stored in the data block is determined, the query instruction is respectively forwarded to each data node, multi-process query is performed, the result is returned for summarization, and a more efficient query scheme is realized.
In one embodiment, the query instruction may further include a time parameter, where the time parameter indicates that the blocking time of the data block to be queried should be before the time characterized by the time parameter.
One form of time parameter may be a direct time parameter, for example, the query instruction is in the form of Retrieve (0X 123456, & v, 20181001), i.e., a data record is required to be acquired for user "0X123456", and the block time (typically a timestamp) of the data block in which the data record is located is 1, X, before 10.01.2018, assuming that the block height of the nearest data block before 10.01.2018 is X.
Another form of time parameter may also be a large integer value when the block of data is high based on the large integer converted in time. For example, the query instruction is in the form of Retrieve (0X 123456, & v, 1547838848300), and the query is required to obtain the data record of user "0X123456", and the block height of the data block where the data record is located is not more than "1547838848300", where the target block height interval is [1,1547838848300]
In this way, the user can perform the query by determining the corresponding query time period, for example, the user can query the data records generated in one month or one day based on the self ID (i.e. service attribute) and the current time, or query the data records generated between certain designated time periods, so that irrelevant data blocks can be filtered out, and the efficiency is improved.
Correspondingly, the embodiment of the specification also provides a data query system based on inverted index, which is applied to a database system of a centralized storage block chain ledger, wherein the database system comprises a coordination node and a plurality of data nodes, and in the system,
the coordination node receives a query instruction containing service attributes, and obtains position information of data records corresponding to the service attributes based on pre-established inverted index query, wherein the inverted index contains the corresponding relation of the service attributes and the position information of the data records, and the position information comprises the block height of the data block where the data records are located and the offset in the data block where the data records are located; determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
any data node receiving the query instruction queries to obtain a corresponding data record according to the position information, and returns the data record obtained by query to the coordination node;
the coordination node gathers the data records returned by all the data nodes, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
Further, in the system, the coordination node creates an inverted index in advance to obtain a specified identification field in the data record, wherein the specified identification field is used for identifying the service attribute of the data record; determining position information of the data record in an account book, wherein the position information comprises block heights of data blocks where the data record is located and offset in the data blocks where the data record is located; and establishing a corresponding relation between the specified identification field and the position information, and writing an index taking the specified identification field as a main key.
Further, in the system, the query instruction further includes a time parameter; correspondingly, the coordination node is used for determining a target block high section according to the time parameter and acquiring the position information of the data record corresponding to the service attribute and positioned in the target block high section; correspondingly, the coordination node forwards the position information and the query instruction in the high section of the target block to each data node.
Further, in the system, a coordination node acquires a generated data block, determines a data node corresponding to the data block according to the block hash value of the data block, distributes the data block to a corresponding data node, establishes routing information of the data block and the data node, and stores the routing information and block header information of the data block; desirably, the data node receives and stores the data block sent by the coordination node.
Further, in the system, the data blocks in the block chain ledger are pre-generated by:
determining hash values of data records to be stored, wherein the data records comprise specified identification fields, and the specified identification fields are used for identifying service attributes of the data records;
when a preset blocking condition is reached, determining each data record in the data block to be written, and generating an Nth data block containing hash values of the data block and the data records, wherein the method specifically comprises the following steps of:
when n=1, the hash value and the block height of the initial data block are given based on a preset manner;
when N >1, determining the hash value of the N data block according to the hash value of each data record and the N-1 data block in the data block to be written, and generating the N data block containing the hash value of the N data block, each data record and the block forming time of the data block, wherein the block height of the data block monotonically increases based on the sequence of the block forming time.
On the other hand, the embodiment of the present disclosure further provides a data query method based on inverted index, which is applied to a coordination node in a database system of a centralized storage block chain ledger, as shown in fig. 6, fig. 6 is a flow chart of the data query method based on inverted index applied to the coordination node provided in the embodiment of the present disclosure, and includes:
S601, receiving a query instruction containing service attributes;
s603, acquiring position information of a data record corresponding to the service attribute based on a pre-established inverted index query, wherein the inverted index comprises a corresponding relation of the service attribute and the position information of the data record, and the position information comprises a block height of a data block where the data record is located and an offset in the data block where the data record is located;
s605, determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
s607, receiving the data records returned by each data node, and summarizing to generate a set of data records corresponding to the service attribute;
s609, sending the collection of the data records to a query instruction initiator.
Corresponding to another aspect, the embodiment of the present disclosure further provides a data query device based on inverted index, which is applied to a coordination node in a database system of a centralized storage block chain ledger, as shown in fig. 7, fig. 7 is a schematic structural diagram of a data query device based on inverted index, provided in the embodiment of the present disclosure, including:
a receiving module 701, which receives a query instruction containing a service attribute;
The position query module 703 queries and obtains the position information of the data record corresponding to the service attribute based on a pre-established inverted index, wherein the inverted index contains the corresponding relation of the service attribute and the position information of the data record, and the position information comprises the block height of the data block where the data record is located and the offset in the data block where the data record is located;
the data node determining module 705 determines a corresponding data node according to the block height and forwards the position information and the query instruction to the determined data node;
a summarizing module 707, configured to receive the data records returned by each data node, and summarize and generate a set of data records corresponding to the service attribute;
a sending module 709 sends the set of data records to the querying-instruction initiator.
The embodiments of the present disclosure also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the data query method shown in fig. 6 when executing the program.
FIG. 8 illustrates a more specific hardware architecture diagram of a computing device provided by embodiments of the present description, which may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data query method shown in fig. 6.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.
The system, method, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the method embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The above-described method embodiments are merely illustrative, in that the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present disclosure. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely a specific implementation of the embodiments of this disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of this disclosure, and these improvements and modifications should also be considered as protective scope of the embodiments of this disclosure.

Claims (13)

1. The data query method based on the inverted index is applied to a database system of a centralized storage block chain type account book, wherein the database system comprises a coordination node and a plurality of data nodes, and the method comprises the following steps:
the coordination node receives a query instruction containing service attributes;
the coordination node obtains the position information of the data record corresponding to the service attribute based on the pre-established inverted index query, wherein the inverted index contains the corresponding relation of the service attribute and the position information of the data record, and the position information comprises the block height of the data block where the data record is located and the offset in the data block where the data record is located;
the coordination node determines a corresponding data node according to the block height and forwards the position information and the query instruction to the determined data node;
any data node receiving the query instruction queries to obtain a corresponding data record according to the position information, and returns the data record obtained by query to the coordination node;
The coordination node gathers the data records returned by all the data nodes, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
2. The method of claim 1, wherein in the coordinating node, the inverted index is pre-established by:
acquiring a specified identification field in a data record, wherein the specified identification field is used for identifying the service attribute of the data record;
determining position information of the data record in an account book, wherein the position information comprises block heights of data blocks where the data record is located and offset in the data blocks where the data record is located;
and establishing a corresponding relation between the specified identification field and the position information, and writing an index taking the specified identification field as a main key.
3. The method of claim 1, the query instruction further comprising a time parameter;
correspondingly, the coordination node is used for determining a target block high section according to the time parameter and acquiring the position information of the data record corresponding to the service attribute and positioned in the target block high section;
correspondingly, the coordination node forwards the position information and the query instruction in the high section of the target block to each data node.
4. The method of claim 1, prior to the coordinating node receiving the query instruction containing the business attribute, the method further comprising:
the coordination node acquires the generated data block, determines the data node corresponding to the data block according to the block hash value of the data block, distributes the data block to the corresponding data node, establishes the routing information of the data block and the data node, and stores the routing information and the block header information of the data block;
the data node receives and stores the data block sent by the coordination node.
5. The method of claim 1, wherein in the database system, data blocks are pre-generated by:
receiving data records to be stored, which contain specified identification fields, and determining hash values of the data records, wherein the specified identification fields are used for identifying service attributes of the data records;
when a preset blocking condition is reached, determining each data record in the data block to be written, and generating an Nth data block containing hash values of the data block and the data records, wherein the method specifically comprises the following steps of:
when n=1, the hash value and the block height of the initial data block are given based on a preset manner;
when N >1, determining the hash value of the N data block according to the hash value of each data record and the N-1 data block in the data block to be written, and generating the N data block containing the hash value of the N data block, each data record and the block forming time of the data block, wherein the block height of the data block monotonically increases based on the sequence of the block forming time.
6. A data query system based on inverted index, which is applied to a database system of a centralized storage block chain ledger, wherein the database system comprises a coordination node and a plurality of data nodes, in the system,
the coordination node receives a query instruction containing service attributes, and obtains position information of data records corresponding to the service attributes based on pre-established inverted index query, wherein the inverted index contains the corresponding relation of the service attributes and the position information of the data records, and the position information comprises the block height of the data block where the data records are located and the offset in the data block where the data records are located; determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
any data node receiving the query instruction queries to obtain a corresponding data record according to the position information, and returns the data record obtained by query to the coordination node;
the coordination node gathers the data records returned by all the data nodes, generates a set of data records corresponding to the service attribute, and sends the set of data records to the query instruction initiator.
7. The system of claim 6, wherein the coordinating node creates an inverted index in advance to obtain a specified identification field in a data record, wherein the specified identification field is used for identifying a service attribute of the data record; determining position information of the data record in an account book, wherein the position information comprises block heights of data blocks where the data record is located and offset in the data blocks where the data record is located; and establishing a corresponding relation between the specified identification field and the position information, and writing an index taking the specified identification field as a main key.
8. The system of claim 6, the query instruction further comprising a time parameter; correspondingly, the coordination node is used for determining a target block high section according to the time parameter and acquiring the position information of the data record corresponding to the service attribute and positioned in the target block high section; correspondingly, the coordination node forwards the position information and the query instruction in the high section of the target block to each data node.
9. The system of claim 6, wherein the coordination node obtains the generated data block, determines the data node corresponding to the data block according to the block hash value of the data block, distributes the data block to the corresponding data node, establishes the routing information of the data block and the data node, and stores the routing information and the block header information of the data block; desirably, the data node receives and stores the data block sent by the coordination node.
10. The system of claim 6, wherein the data blocks in the block chain ledger are pre-generated in the database system by:
determining hash values of data records to be stored, wherein the data records comprise specified identification fields, and the specified identification fields are used for identifying service attributes of the data records;
when a preset blocking condition is reached, determining each data record in the data block to be written, and generating an Nth data block containing hash values of the data block and the data records, wherein the method specifically comprises the following steps of:
when n=1, the hash value and the block height of the initial data block are given based on a preset manner;
when N >1, determining the hash value of the N data block according to the hash value of each data record and the N-1 data block in the data block to be written, and generating the N data block containing the hash value of the N data block, each data record and the block forming time of the data block, wherein the block height of the data block monotonically increases based on the sequence of the block forming time.
11. A data query method based on inverted index, applied to a coordination node in a database system of a centralized storage block chain ledger, the method comprising:
receiving a query instruction containing service attributes;
Acquiring position information of a data record corresponding to the service attribute based on a pre-established inverted index query, wherein the inverted index comprises a corresponding relation of the service attribute and the position information of the data record, and the position information comprises a block height of a data block where the data record is located and an offset in the data block where the data record is located;
determining a corresponding data node according to the block height, and forwarding the position information and the query instruction to the determined data node;
receiving data records returned by each data node, and summarizing to generate a set of data records corresponding to the service attribute;
and sending the set of the data records to a query instruction initiator.
12. A reverse index-based data query apparatus for a coordinator node in a database system for a centralized storage block chain ledger, the apparatus comprising:
the receiving module receives a query instruction containing service attributes;
the position inquiry module is used for inquiring and acquiring the position information of the data record corresponding to the service attribute based on a pre-established inverted index, wherein the inverted index comprises the corresponding relation of the service attribute and the position information of the data record, and the position information comprises the block height of the data block where the data record is located and the offset in the data block where the data record is located;
The data node determining module is used for determining a corresponding data node according to the block height and forwarding the position information and the query instruction to the determined data node;
the summarizing module is used for receiving the data records returned by each data node and summarizing to generate a set of data records corresponding to the service attribute;
and the sending module is used for sending the set of the data records to the inquiry instruction initiator.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of claim 11 when executing the program.
CN201910537024.3A 2019-06-20 2019-06-20 Data query method, system, device and equipment based on inverted index Active CN110334094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910537024.3A CN110334094B (en) 2019-06-20 2019-06-20 Data query method, system, device and equipment based on inverted index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910537024.3A CN110334094B (en) 2019-06-20 2019-06-20 Data query method, system, device and equipment based on inverted index

Publications (2)

Publication Number Publication Date
CN110334094A CN110334094A (en) 2019-10-15
CN110334094B true CN110334094B (en) 2023-05-16

Family

ID=68142250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910537024.3A Active CN110334094B (en) 2019-06-20 2019-06-20 Data query method, system, device and equipment based on inverted index

Country Status (1)

Country Link
CN (1) CN110334094B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874486B (en) * 2019-10-18 2023-10-17 蚂蚁区块链科技(上海)有限公司 Method, device and equipment for reading data in block chain type account book
CN111339042B (en) * 2020-03-26 2024-03-01 北京快映互娱传媒有限公司 Data operation processing method, system and scheduling server
CN113761102A (en) * 2020-11-18 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method, device, server, system and storage medium
CN112800104A (en) * 2020-12-08 2021-05-14 江苏苏宁云计算有限公司 Method and device for optimizing ES query request link
CN113901279B (en) * 2021-12-03 2022-03-22 支付宝(杭州)信息技术有限公司 Graph database retrieval method and device
CN114185890B (en) * 2021-12-09 2022-11-01 北京航星永志科技有限公司 Database retrieval method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3173947A1 (en) * 2015-11-30 2017-05-31 Sap Se Paged inverted index
CN109902086A (en) * 2019-01-31 2019-06-18 阿里巴巴集团控股有限公司 A kind of index creation method, device and equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3173947A1 (en) * 2015-11-30 2017-05-31 Sap Se Paged inverted index
CN109902086A (en) * 2019-01-31 2019-06-18 阿里巴巴集团控股有限公司 A kind of index creation method, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
维、哈、柯全文搜索引擎索引器的设计与实现;吐尔地·托合提等;《情报杂志》;20081018(第10期);全文 *

Also Published As

Publication number Publication date
CN110334094A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110334094B (en) Data query method, system, device and equipment based on inverted index
CN110188096B (en) Index creating method, device and equipment for data record
CN110162526B (en) Method, device and equipment for inquiring data records in block chain type account book
WO2021073242A1 (en) Index creation and data querying methods, apparatus and device
CN110162662B (en) Verification method, device and equipment for data records in block chain type account book
WO2020211569A1 (en) Method for constructing index of data record
CN110162523B (en) Data storage method, system, device and equipment
US10795874B2 (en) Creating index in blockchain-type ledger
US20210158352A1 (en) Methods and systems for recording data based on plurality of blockchain networks
CN112487492B (en) Data verification method, system and equipment
WO2021017422A1 (en) Index creation method in block chain type account book, device and apparatus
US10963453B2 (en) Service identifier-based data indexing
US11126751B2 (en) Index creation for data records
US10999062B2 (en) Blockchain-type data storage
US11050550B2 (en) Methods and systems for reading data based on plurality of blockchain networks
CN110347679B (en) Data storage method, device and equipment based on receipt
US20210157801A1 (en) Methods and systems for recording data based on plurality of blockchain networks
WO2021057127A1 (en) Method, device, and equipment for data storage based on multiple service attributes
CN110716965A (en) Query method, device and equipment in block chain type account book
CN110362570B (en) Data storage method, device and equipment
CN110347748B (en) Data verification method, system, device and equipment based on inverted index
US11086849B2 (en) Methods and systems for reading data based on plurality of blockchain networks
CN111444194B (en) Method, device and equipment for clearing indexes in block chain type account book
CN110636042B (en) Method, device and equipment for updating verified block height of server
CN110874486B (en) Method, device and equipment for reading data in block chain type account book

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40014998

Country of ref document: HK

TA01 Transfer of patent application right

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant