CN111552695A - Data storage and query method, device and machine-readable storage medium - Google Patents

Data storage and query method, device and machine-readable storage medium Download PDF

Info

Publication number
CN111552695A
CN111552695A CN202010497934.6A CN202010497934A CN111552695A CN 111552695 A CN111552695 A CN 111552695A CN 202010497934 A CN202010497934 A CN 202010497934A CN 111552695 A CN111552695 A CN 111552695A
Authority
CN
China
Prior art keywords
value
data
condition
query
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010497934.6A
Other languages
Chinese (zh)
Inventor
刘一平
李爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AlipayCom Co ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010497934.6A priority Critical patent/CN111552695A/en
Publication of CN111552695A publication Critical patent/CN111552695A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present specification provide a method of data storage, a method of data query, an apparatus, a computing device and a machine-readable storage medium. The data storage method comprises the following steps: extracting a plurality of element values respectively corresponding to a plurality of elements from data to be stored of the service based on the plurality of elements for characterizing the service; converting the multiple element values to obtain identification values; determining a primary key of the data to be stored based on the identification value; and storing the data to be stored into the target bucket corresponding to the primary key of the data to be stored based on the preset corresponding relation between different primary keys and different buckets.

Description

Data storage and query method, device and machine-readable storage medium
Technical Field
Embodiments of the present description relate to the field of data processing, and in particular, to a method of data storage, a method of data query, an apparatus, and a machine-readable storage medium.
Background
With the rapid development of information technology and the internet, more and more information is electronized, so that data in the business processing process is increased explosively. The generation of such massive data brings huge challenges to the processing of data storage, query and the like. Therefore, how to handle such massive data becomes a hot problem.
Disclosure of Invention
In view of the prior art, embodiments of the present specification provide methods of data storage, methods of data querying, apparatuses, computing devices and machine-readable storage media.
In one aspect, an embodiment of the present specification provides a data storage method, including: extracting a plurality of element values respectively corresponding to a plurality of elements from data to be stored of a service based on the plurality of elements for characterizing the service; converting the element values to obtain identification values; determining a primary key of the data to be stored based on the identification value; and storing the data to be stored into a target storage bucket corresponding to the primary key of the data to be stored based on the preset corresponding relation between different primary keys and different storage buckets.
In another aspect, an embodiment of the present specification provides a method for data query, where the data is stored according to the above method for data storage, and the method for data query includes: receiving a data query request, wherein the data query request comprises a basic query condition, and the basic query condition is used for indicating a value condition of at least one element in the plurality of elements; converting the value taking condition of the at least one element to obtain an identification value taking condition; determining a primary key value taking condition based on the identification value taking condition; determining at least one bucket meeting the main key value taking condition based on the preset corresponding relation; and querying the data stored in the at least one bucket based on the data query request to obtain a query result.
In another aspect, an embodiment of the present specification provides an apparatus for data storage, including: an extraction unit that extracts, from data to be stored of a service, a plurality of element values respectively corresponding to a plurality of elements based on the plurality of elements for characterizing the service; a conversion processing unit which performs conversion processing on the plurality of element values to obtain an identification value; a determination unit that determines a primary key of the data to be stored based on the identification value; the storage unit stores the data to be stored into a target bucket corresponding to the primary key of the data to be stored based on a predetermined corresponding relationship between different primary keys and different buckets.
In another aspect, an embodiment of the present specification provides an apparatus for data query, where the data is stored by using the above apparatus for data storage, and the apparatus for data query includes: a receiving unit, configured to receive a data query request, where the data query request includes a basic query condition, and the basic query condition is used to indicate a value condition of at least one element of the multiple elements; the conversion processing unit is used for converting the value taking condition of the at least one element to obtain an identification value taking condition; a first determination unit that determines a primary key value condition based on the identification value condition; a second determination unit that determines at least one bucket that satisfies the primary key value condition based on the predetermined correspondence relationship; and the query unit queries the data stored in the at least one bucket based on the data query request to obtain a query result.
In another aspect, embodiments of the present specification provide a computing device comprising: at least one processor; a memory in communication with the at least one processor having executable code stored thereon, which when executed by the at least one processor causes the at least one processor to implement the above-described method of data storage.
In another aspect, embodiments of the present specification provide a computing device comprising: at least one processor; a memory in communication with the at least one processor having executable code stored thereon, which when executed by the at least one processor causes the at least one processor to implement the method of data querying described above.
In another aspect, embodiments of the present description provide a machine-readable storage medium storing executable code that, when executed, causes a machine to perform the above-described method of data storage.
In another aspect, embodiments of the present specification provide a machine-readable storage medium storing executable code that, when executed, causes a machine to perform the above-described method of data querying.
Drawings
The foregoing and other objects, features and advantages of the embodiments of the present specification will become more apparent from the following more particular description of the embodiments of the present specification, as illustrated in the accompanying drawings in which like reference characters generally represent like elements throughout.
FIG. 1 is a schematic flow chart diagram of a method of data storage according to one embodiment.
FIG. 2 illustrates one example of representing transactional data by spatial coordinates.
FIG. 3A illustrates an example of dimension reduction using a z-order curve dimension reduction algorithm.
FIG. 3B illustrates one example of dimensionality reduction of multi-dimensional element information of transactional data.
FIG. 4 is a schematic flow chart diagram of a method of data querying, according to one embodiment.
FIG. 5 is a diagram of an example process for data storage and querying, according to one embodiment.
FIG. 6A is a schematic block diagram of an apparatus for data storage according to one embodiment.
FIG. 6B is a schematic block diagram of an apparatus for data querying, according to one embodiment.
FIG. 7A is a hardware block diagram of a computing device for data storage according to one embodiment.
FIG. 7B is a hardware block diagram of a computing device for data querying, according to one embodiment.
Detailed Description
The subject matter described herein will now be discussed with reference to various embodiments. It should be understood that these examples are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the claims. Various embodiments may omit, replace, or add various procedures or components as desired.
As used herein, the term "include" and variations thereof may mean an open-ended term in the sense of "including, but not limited to. The term "based on" may mean "based at least in part on. The terms "one embodiment" and "an embodiment" may mean "at least one embodiment". The term "another embodiment" may mean "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Definitions for other terms may be included below, whether explicit or implicit, and the definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.
With the rapid development of information technology, more and more electronic data is generated in various business processes. Therefore, how to efficiently store and query data becomes one of the research hotspots, especially in the context of massive data.
In view of this, the embodiments of the present disclosure provide a data storage scheme and a corresponding data query scheme, which can greatly reduce storage cost and maintenance cost, and effectively increase data query speed, especially in the case of massive data.
Furthermore, embodiments of the present description may be applied to various business scenarios, such as transaction businesses. For example, embodiments of the present description may be applied to anti-money laundering transaction data storage and query scenarios.
The following description will be made in conjunction with specific embodiments. It should be understood that the following examples are only for helping those skilled in the art to better understand the technical solutions of the present specification, and are not intended to limit the scope thereof.
FIG. 1 is a schematic flow chart diagram of a method of data storage according to one embodiment.
As shown in fig. 1, in step 102, a plurality of element values respectively corresponding to a plurality of elements may be extracted from data to be stored of a service based on the plurality of elements used to characterize the service.
In step 104, a conversion process may be performed on the plurality of element values to obtain an identification value.
In step 106, a primary key for the data to be stored may be determined based on the identification value.
In step 108, the data to be stored may be stored into the target bucket corresponding to the primary key of the data to be stored based on a predetermined correspondence between different primary keys and different buckets (buckets).
In the embodiment, the identification value is obtained by converting a plurality of element values, so that the primary key of the data to be stored can be obtained to simply and effectively represent the data; and then storing the data to be stored into the target bucket based on the preset corresponding relation between different main keys and different buckets, so that the data can be stored in an index, insertion or ordered mode. Therefore, the storage cost and the maintenance cost can be reduced, and the subsequent query speed can be effectively improved.
In this document, the term service may refer to various electronic services, such as transaction services and the like. In general, a service may have multiple elements that can be characterized. For example, for a transaction service, it may be characterized by a number of elements such as the transaction subject, the transaction amount, the transaction time, the transaction mode, and so on.
Furthermore, a service may include numerous elements, such as tens of elements, that can characterize it. Then, to facilitate data processing, in one embodiment, the plurality of elements described in step 102 may be essential or basic elements capable of characterizing a service.
Of course, in different business scenarios, how many elements and which elements to select as the plurality of elements mentioned in step 102 may depend on various factors such as actual business requirements, processing resources, and so on. For example, the plurality of elements for characterizing the traffic may include two to three elements in consideration of current computer processing performance.
For example, in the case where the above-described service is a transaction service, the plurality of elements may include at least two of a transaction subject, a transaction amount, and a transaction time.
Further, it is to be appreciated that individual elements can have corresponding element values in different data instances of a business. In this context, the data of the service may be stripped in various suitable manners, and specific element values of the corresponding elements are extracted from the data.
For example, for a certain transaction data, a specific element value 2088xxxxxx of the element "transaction subject", a specific element value 12345 of the element "transaction amount", a specific element value 20200213 of the element "transaction time", and the like can be extracted.
As can be seen from the above, extracting the plurality of element values corresponding to the plurality of elements from the data may also be actually considered as spatially characterizing the data, and the plurality of element values may be considered as spatial coordinates representing the data.
For ease of understanding, the following description will take the transaction service as an example. FIG. 2 illustrates one example of representing transactional data by spatial coordinates. In the example of fig. 2, axis X may represent a transaction subject, axis Y may represent a transaction time, and axis Z may represent a transaction amount. It can be seen that in this way, the transaction data can be represented by coordinates (x, y, z). For example, in the example of fig. 2, (x1, y1, z1), (x2, y2, z2), and (x3, y3, z3) may represent three pieces of transaction data, respectively.
Therefore, the data are subjected to spatial representation through a plurality of element values, the data can be effectively simplified, and the subsequent data storage and query processing speed can be increased.
In some cases, the conversion processing may be performed on a plurality of element values, especially in the case where the number of element values is large, in consideration of the fact that the processing complexity on the plurality of element values may be relatively large. In this context, any suitable method may be used to transform the plurality of element values to obtain a simpler identification value compared to the plurality of element values.
For example, multiple element values may be fused to obtain an identification value. For example, various suitable algorithms may be used to fuse together multiple element values to obtain an identification value. In this way, subsequent data storage or querying processes can be simplified.
In some cases, the multiple element values may be represented using various formats, which are not necessarily formats suitable for machine processing. Thus, in fusing multiple element values, each element value may first be converted to a format suitable for machine processing. For example, in a transaction, the values of the elements may be represented in a text format, a natural number format, or the like, and the formats may be converted into a format suitable for machine processing, such as a binary format, a hexadecimal format, or the like.
As can be understood from the above, the dimension reduction processing is actually performed on a plurality of element values by fusing the plurality of element values to the flag value. Thus, in one embodiment, various suitable dimension reduction algorithms may be employed to fuse the plurality of element values into an identification value. For example, a z-order curve dimension reduction algorithm may be employed. The idea of the z-order curve dimension reduction algorithm is to traverse in the direction of increasing each dimension of the high-dimensional data, thereby generating a mapping from the high-dimensional data to the low-dimensional data.
For ease of understanding, FIG. 3A illustrates an example of fusing multiple element values using a z-order curve dimensionality reduction algorithm. In the example of FIG. 3A, a z-order curve may be employed to reduce two element values to one identification value. For example, in the example of fig. 3A, x may represent a first element and y may represent a second element. When the value of x is 00 and the value of y is 11, the identification value z is 5 after the dimensionality reduction of the z-order curve.
In addition, for further explanation, the following takes transaction service as an example to explain how to perform fusion processing by using z-order curve dimension reduction algorithm. FIG. 3B illustrates one example of fusing multiple element values of transactional data.
In the example of fig. 3B, it is assumed that the plurality of elements characterizing the transaction traffic mentioned in step 102 include a transaction body, a transaction amount, and a transaction time. Based on these three elements, a corresponding plurality of element values are extracted from the data to be stored of the transaction service. Assume that the subject of the transaction has a value of 2088xxxxxx, the amount of the transaction has a value of 12345, and the time of the transaction has a value of 20200213.
Here, since each element value is represented by a natural number, it can be converted into a binary format suitable for machine processing, respectively. Assuming that the value 2088xxxxxx of the transaction principal can be converted to the binary value 11001010, the value 12345 of the transaction amount can be converted to the binary value 10010001, and the value 20200213 of the transaction time can be converted to the binary value 01010101.
Then, the three element values are fused by adopting a z-order curve dimension reduction algorithm, and a mark value can be obtained. In this example, the identification value is 1551546555.
In the embodiment, the multiple element values are fused through the dimension reduction algorithm, so that not only can the data processing complexity be greatly reduced, but also the association degree among the multiple elements can be kept, and the single-dimensional effect is avoided.
Thereafter, a primary key of the data to be stored may be derived based on the identification value. For example, a hash operation may be performed on the identification value to obtain a primary key of the data to be stored. By performing the hash operation, the obtained primary keys can be uniformly distributed, so that the data can be more uniformly distributed in the corresponding buckets. Of course, if the distribution of the identification value itself is uniform, the identification value itself may also be used as the primary key of the data to be stored. This may be done to determine the exact manner in which the primary key is selected.
Thereafter, in step 108, a target bucket corresponding to the primary key of the data to be stored may be determined based on the correspondence between the primary key and the bucket.
For example, the correspondence between the primary keys and buckets may be predefined, such as one primary key may correspond to one bucket and one bucket may correspond to a plurality of primary keys.
Herein, a bucket may refer to a distributed instance for storage and computation.
In some existing storage methods, an index may be built and stored for data. Then, in the case of massive data, the index itself may occupy a large amount of storage space, resulting in increased storage and maintenance costs.
In the embodiment of the present disclosure, data is stored in the corresponding bucket according to the corresponding relationship between the primary key and the bucket, which is to store the data in an ordered manner, that is, according to the data itself, that is, in an index manner, and in an insertion manner, the primary key may not be stored, so that the primary key is prevented from occupying a large amount of storage space, thereby greatly saving storage cost and maintenance cost, and thus, the subsequent data query speed can be effectively increased.
The process of the data storage method is described above in connection with specific embodiments. How to perform corresponding data query in the case of storing data by the above method will be described below with reference to specific embodiments.
FIG. 4 is a schematic flow chart diagram of a method of data querying, according to one embodiment.
As shown in fig. 4, in step 402, a data query request may be received.
For example, a data query request may be received from a user. The data query request may include a base query condition.
The base query condition may be used to indicate a value condition of at least one of the plurality of elements mentioned in step 102 of fig. 1. For example, a query condition about at least one element input by a user through a user terminal may be received. In some cases, the query conditions may be expressed in various formats such as text, natural numbers, and the like, but the query conditions may be converted into a value condition of the at least one element by an applicable format conversion.
In step 404, a conversion process may be performed on the value condition of at least one element to obtain an identifier value condition.
In step 406, a primary key value condition may be determined based on the identification value condition.
In step 408, at least one bucket satisfying the value condition of the primary key may be determined based on the predetermined correspondence between different primary keys and different buckets.
In step 410, data stored in at least one bucket may be queried based on the data query request to obtain a query result.
For example, data stored in at least one bucket may be queried based at least on the base query condition. If data meeting the query condition is queried, the query result can comprise corresponding data. If no data meeting the query condition is queried, the query result may indicate that no data meeting the query condition exists.
Therefore, in this embodiment, the value condition of the primary key may be determined by performing conversion processing on the value condition of at least one element, so that at least one bucket to be queried may be effectively located first, the query range is reduced, and then query is performed on data stored in at least one bucket, thereby greatly increasing the query speed.
In an embodiment, the value condition of the at least one element may include a target value or a value range of each element. That is, the base query conditions may include point query conditions and/or range query conditions to satisfy various user query requirements.
For example, for a transaction business, assume that the plurality of elements referred to in step 102 of fig. 1 include a transaction body, a transaction amount, and a transaction time. Then, the basic query condition may include a query condition for at least one of a transaction subject, a transaction amount, and a transaction time. If the query condition for a transaction subject specifies that the transaction subject is a subject, which also means that a specific value of the transaction subject is specified, the query condition may belong to a point query condition. If the query for the transaction amount specifies that the transaction amount is greater than a certain value, which also means that the value range for the specified transaction amount, then the query may belong to a range query.
In step 404, a process similar to step 104 in FIG. 1 may be employed for processing.
For example, the target value or the boundary value of the value range of at least one element may be fused to obtain the boundary value of the target identifier value or the identifier value range. The fusion of the values can be performed here in the same way as in the data storage process.
For example, a dimension reduction algorithm may be used to fuse the target value or the boundary value of the value range of at least one element. For example, the dimension reduction algorithm may employ the same dimension reduction algorithm as in the data storage process, such as the aforementioned z-order curve dimension reduction algorithm.
Then, in step 406, a hash operation may be performed on the target identifier value or the boundary value of the identifier value range to obtain a target primary key value or a boundary value of the primary key value range, so as to obtain a primary key value condition.
In step 408, at least one bucket meeting the value condition of the primary key may be determined based on the predetermined corresponding relationship between different primary keys and different buckets, so that other buckets not storing the target data may be filtered out, the query range is narrowed, and the query speed can be effectively increased.
For ease of understanding, the following description will take transaction data as an example. For example, assume that the plurality of elements referred to in step 102 of FIG. 1 include a transaction subject, a transaction time, and a transaction amount. Further assume that the base query includes a query for a transaction amount and a transaction time. In this example, assume that the query condition for the transaction amount specifies a transaction amount greater than 2 and the query condition for the transaction time specifies a transaction time greater than 2.
And then, fusing the boundary values of the value ranges of the two elements by adopting a z-order dimension reduction algorithm to obtain the identification value. And then, carrying out hash operation on the identification value to obtain a boundary value of the value range of the primary key. Here, the boundary value is assumed to be 13. Then, a primary key value condition of greater than 13 can be obtained.
In addition, assume that there are a total of 16 buckets, numbered from 1 to 16, which correspond to primary keys 1 to 16, respectively. Then, when the primary key is 13, its corresponding bucket is bucket 13. Based on the primary key value condition (i.e., greater than 13), the buckets to be queried may be determined to be buckets 14, 15, and 16.
Therefore, by the embodiment, the buckets to be queried can be effectively positioned, the query range is greatly reduced, and the query speed is improved.
Further, in one embodiment, the data query request may also include an expanded query term. The expanded query condition may indicate a value condition of the optional element. Here, the optional elements may include elements capable of characterizing the service in addition to the plurality of elements mentioned in step 102 of fig. 1. For example, for a transaction service, assuming that the plurality of elements mentioned in step 102 include a transaction subject, a transaction amount, and a transaction time, the optional elements may include other elements such as a transaction mode.
Similar to the base query condition, the expanded query condition may also include a point query condition or a range query condition for each of the selectable elements, thereby indicating a target value or value range of the selectable element accordingly.
Then, in step 410, the data stored in at least one bucket may also be queried based on the base query condition and the extended query condition, so as to obtain a query result.
For example, if data meeting the basic query condition and the extended query condition is queried, the query result may include data meeting the condition. If no data meeting the base query condition and the extended query condition is queried, the query result may indicate that no data meeting the conditions exists.
For the sake of understanding, the following will explain the procedures of the technical solutions of the present specification with reference to specific examples. It should be understood that the following examples do not set any limit to the scope of the technical solutions of the present description.
FIG. 5 is a diagram of an example process for data storage and querying, according to one embodiment. In the example of fig. 5, the description is given taking transaction data as an example, and it is assumed that the transaction data to be currently stored includes transaction data a, transaction data B, and transaction data C.
As shown in fig. 5, a plurality of element values respectively corresponding to a plurality of elements may be first extracted from the transaction data a, the transaction data B, and the transaction data C. For example, the plurality of elements may include a transaction body, a transaction amount, and a transaction time.
Then, a plurality of element values extracted from the transaction data A can be fused by adopting a z-order curve dimension reduction algorithm to obtain an identification value of the transaction data A. And then carrying out hash operation on the identification value of the transaction data A to obtain a main key of the transaction data A. In a similar manner, the primary key of each of transaction data B and transaction data C may be obtained. In this example, it is assumed that the primary key of transaction data a is 2, the primary key of transaction data B is 1, and the primary key of transaction data C is 3. In addition, it is assumed that the primary keys 1 to 3 correspond to buckets 1 to 3, respectively.
Then, transaction data a may be stored in bucket 2, transaction data B may be stored in bucket 1, and transaction data C may be stored in bucket 3.
When data query is performed, a z-order curve dimension reduction algorithm can be adopted to perform dimension reduction processing on a basic query condition and perform hash operation. For example, the basic query may specify a value range of the transaction subject, the transaction amount, and the transaction time. After the dimensionality reduction and hash processing, the primary key value taking condition can be determined.
In the example of fig. 5, it is assumed that buckets to be queried are determined to be buckets 2 and 3 based on correspondence between different primary keys and different buckets and a primary key value condition.
The transactional data a and C stored in buckets 2 and 3 may then be queried based on the base query conditions and/or the extended query conditions (if any), thereby resulting in a query result.
FIG. 6A is a schematic block diagram of an apparatus for data storage according to one embodiment.
As shown in fig. 6A, the apparatus 600A may include an extraction unit 602A, a conversion processing unit 604A, a determination unit 606A, and a storage unit 608A.
The extracting unit 602A may extract, based on a plurality of elements used to characterize the service, a plurality of element values respectively corresponding to the plurality of elements from the data to be stored of the service.
The conversion processing unit 604A may perform conversion processing on the plurality of element values to obtain the identification value.
The determination unit 606A may determine a primary key of the data to be stored based on the identification value.
The storage unit 608A may store the data to be stored in the target bucket corresponding to the primary key of the data to be stored, based on a predetermined correspondence between different primary keys and different buckets.
In one embodiment, the conversion processing unit 604A may fuse the plurality of element values to obtain the identification value.
In one embodiment, the conversion processing unit 604A may employ a dimension reduction algorithm to fuse the plurality of element values into the identification value.
In one embodiment, the dimension reduction algorithm may include a z-order curve algorithm.
In one embodiment, the determining unit 606A may perform a hash operation on the identification value to obtain a primary key of the data to be stored.
In one embodiment, the transaction may be a transaction, and the plurality of elements may include at least two of a transaction subject, a transaction amount, and a transaction time.
The units of the apparatus 600A may perform corresponding steps in the method embodiments of fig. 1-3B and fig. 5, and therefore, for brevity of description, specific operations and functions of the units of the apparatus 600A are not described herein again.
FIG. 6B is a schematic block diagram of an apparatus for data querying, according to one embodiment.
As shown in fig. 6B, the apparatus 600B may include a receiving unit 602B, a conversion processing unit 604B, a first determining unit 606B, a second determining unit 608B, and a querying unit 610B. The data queried by device 600B may be stored using device 600A.
The receiving unit 602B may receive a data query request, where the data query request may include a basic query condition, and the basic query condition is used to indicate a value condition of at least one element of the plurality of elements.
The conversion processing unit 604B may perform conversion processing on the value taking condition of at least one element to obtain an identifier value taking condition.
The first determining unit 606B may determine the primary key value condition based on the identifier value condition.
The second determination unit 608B may determine at least one bucket satisfying the primary key value condition based on the predetermined correspondence relationship.
The querying unit 610B may query the data stored in the at least one bucket based on the data query request to obtain a query result.
In one embodiment, the value condition of the at least one element may include a target value or a value range of each element in the at least one element.
In an embodiment, the conversion processing unit 604B may fuse the target value or the boundary value of the value range of at least one element to obtain the boundary value of the target identifier value or the identifier value range.
In one embodiment, the conversion processing unit 604B may adopt a dimension reduction algorithm to fuse the target value or the boundary value of the value range of at least one element.
In one embodiment, the dimension reduction algorithm may include a z-order curve algorithm.
In an embodiment, the first determining unit 606B may perform a hash operation on the boundary value of the target identifier value or the identifier value range to obtain a target primary key value or a boundary value of the primary key value range.
In one embodiment, the data query request may also include an expanded query condition. The extended query condition is used to indicate a value condition of an optional element, where the optional element is an element used for characterizing a service other than the multiple elements.
The querying unit 610B may query the data stored in the at least one bucket based on the base query condition and the extended query condition to obtain a query result.
The units of the apparatus 600B may perform corresponding steps in the method embodiments of fig. 4-5, and therefore, for brevity of description, specific operations and functions of the units of the apparatus 600B are not described herein again.
The apparatuses 600A and 600B may be implemented by hardware, software, or a combination of hardware and software. For example, when implemented in software, the apparatus 600A and 600B may be formed by a processor of a device in which the processor reads corresponding executable code from a memory (e.g., a non-volatile memory) into the memory for execution.
FIG. 7A is a hardware block diagram of a computing device for data storage according to one embodiment. As shown in fig. 7A, computing device 700A may include at least one processor 702A, storage 704A, memory 706A, and communication interface 708A, and the at least one processor 702A, storage 704A, memory 706A, and communication interface 708A are connected together via a bus 710A. The at least one processor 702A executes at least one executable code (i.e., the elements described above as being implemented in software) stored or encoded in the memory 704A.
In one embodiment, the executable code stored in the memory 704A, when executed by the at least one processor 702A, causes the computing device to implement the various processes described above in connection with fig. 1-3B and 5.
FIG. 7B is a hardware block diagram of a computing device for data querying, according to one embodiment. As shown in fig. 7B, computing device 700B may include at least one processor 702B, storage 704B, memory 706B, and communication interface 708B, and the at least one processor 702B, storage 704B, memory 706B, and communication interface 708B are connected together via a bus 710B. The at least one processor 702B executes at least one executable code (i.e., the elements described above as being implemented in software) stored or encoded in the memory 704B.
In one embodiment, the executable code stored in the memory 704B, when executed by the at least one processor 702B, causes the computing device to implement the processes described above in connection with fig. 4-5.
Computing devices 700A and 700B may be implemented in any suitable form known in the art, including, for example, but not limited to, desktop computers, laptop computers, smart phones, tablet computers, consumer electronics devices, wearable smart devices, and the like.
Embodiments of the present specification also provide a machine-readable storage medium. The machine-readable storage medium may store executable code that, when executed by a machine, causes the machine to implement particular processes of the method embodiments described above with reference to fig. 1-3B and 5.
Embodiments of the present specification also provide a machine-readable storage medium. The machine-readable storage medium may store executable code that, when executed by a machine, causes the machine to perform particular processes of the method embodiments described above with reference to fig. 4-5.
For example, the machine-readable storage medium may include, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Static Random Access Memory (SRAM), a hard disk, a flash Memory, and the like.
It should be understood that the embodiments in this specification are described in a progressive manner, and that the same or similar parts in the various embodiments may be mutually referred to, and each embodiment is described with emphasis instead of others. For example, as for the embodiments of the apparatus, the computing device and the machine-readable storage medium, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.
Specific embodiments of this specification have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Not all steps and elements in the above flows and system structure diagrams are necessary, and some steps or elements may be omitted according to actual needs. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities respectively, or some units may be implemented by some components in a plurality of independent devices together.
The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the embodiments of the present disclosure are not limited to the specific details of the embodiments, and various modifications may be made within the technical spirit of the embodiments of the present disclosure, which belong to the scope of the embodiments of the present disclosure.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (30)

1. A method of data storage, comprising:
extracting a plurality of element values respectively corresponding to a plurality of elements from data to be stored of a service based on the plurality of elements for characterizing the service;
converting the element values to obtain identification values;
determining a primary key of the data to be stored based on the identification value;
and storing the data to be stored into a target storage bucket corresponding to the primary key of the data to be stored based on the preset corresponding relation between different primary keys and different storage buckets.
2. The method of claim 1, wherein converting the plurality of element values to obtain an identification value comprises:
and fusing the element values to obtain the identification value.
3. The method of claim 2, wherein fusing the plurality of element values to obtain the identification value comprises:
and fusing the element values into the identification value by adopting a dimension reduction algorithm.
4. The method of claim 3, wherein the dimension reduction algorithm comprises a z-order curve algorithm.
5. The method of any of claims 1 to 4, determining a primary key of the data to be stored based on the identification value, comprising:
and carrying out Hash operation on the identification value to obtain a main key of the data to be stored.
6. The method of any one of claims 1 to 4, wherein the transaction is a transaction, the plurality of elements including at least two of a transaction subject, a transaction amount, and a transaction time.
7. A method of data querying, wherein the data is stored according to the method of any one of claims 1 to 6, the method of data querying comprising:
receiving a data query request, wherein the data query request comprises a basic query condition, and the basic query condition is used for indicating a value condition of at least one element in the plurality of elements;
converting the value taking condition of the at least one element to obtain an identification value taking condition;
determining a primary key value taking condition based on the identification value taking condition;
determining at least one bucket meeting the main key value taking condition based on the preset corresponding relation;
and querying the data stored in the at least one bucket based on the data query request to obtain a query result.
8. The method of claim 7, wherein the value condition of the at least one element includes a target value or a value range of each element in the at least one element.
9. The method of claim 8, wherein converting the value condition of the at least one element to obtain an identifier value condition comprises:
and fusing the target value or the boundary value of the value range of the at least one element to obtain the boundary value of the target identifier value or the identifier value range.
10. The method of claim 9, wherein fusing the target value or the boundary value of the value range of the at least one element comprises:
and fusing the target value or the boundary value of the value range of the at least one element by adopting a dimensionality reduction algorithm.
11. The method of claim 10, wherein the dimension reduction algorithm comprises a z-order curve algorithm.
12. The method of any one of claims 9 to 11, wherein determining a primary key value condition based on the identity value condition comprises:
and carrying out Hash operation on the target identification value or the boundary value of the identification value range to obtain a target main key value or a boundary value of a main key value range.
13. The method according to any one of claims 7 to 11, wherein the data query request further includes an extended query condition, wherein the extended query condition is used to indicate a value condition of an optional element, and the optional element is an element, other than the plurality of elements, used for characterizing the service;
querying data stored in the at least one bucket based on the data query request to obtain a query result, comprising:
and querying the data stored in the at least one storage bucket based on the basic query condition and the extended query condition to obtain a query result.
14. An apparatus for data storage, comprising:
an extraction unit that extracts, from data to be stored of a service, a plurality of element values respectively corresponding to a plurality of elements based on the plurality of elements for characterizing the service;
a conversion processing unit which performs conversion processing on the plurality of element values to obtain an identification value;
a determination unit that determines a primary key of the data to be stored based on the identification value;
the storage unit stores the data to be stored into a target bucket corresponding to the primary key of the data to be stored based on a predetermined corresponding relationship between different primary keys and different buckets.
15. The apparatus of claim 14, wherein the conversion processing unit is further to fuse the plurality of element values to obtain the identification value.
16. The apparatus of claim 15, wherein the conversion processing unit is further to fuse the plurality of element values into the identification value using a dimension reduction algorithm.
17. The apparatus of claim 16, wherein the dimension reduction algorithm comprises a z-order curve algorithm.
18. The apparatus according to any one of claims 14 to 17, wherein the determining unit further performs a hash operation on the identification value to obtain a primary key of the data to be stored.
19. The apparatus of any one of claims 14 to 17, wherein the transaction is a transaction, the plurality of elements including at least two of a transaction subject, a transaction amount, and a transaction time.
20. An apparatus for data querying, wherein the data is stored with the apparatus according to any one of claims 14 to 19, the apparatus for data querying comprising:
a receiving unit, configured to receive a data query request, where the data query request includes a basic query condition, and the basic query condition is used to indicate a value condition of at least one element of the multiple elements;
the conversion processing unit is used for converting the value taking condition of the at least one element to obtain an identification value taking condition;
a first determination unit that determines a primary key value condition based on the identification value condition;
a second determination unit that determines at least one bucket that satisfies the primary key value condition based on the predetermined correspondence relationship;
and the query unit queries the data stored in the at least one bucket based on the data query request to obtain a query result.
21. The apparatus of claim 20, wherein the value condition of the at least one element comprises a target value or a value range of each element of the at least one element.
22. The apparatus of claim 21, wherein the conversion processing unit further fuses a boundary value of a target value or a value range of the at least one element to obtain a boundary value of a target identifier value or a identifier value range.
23. The apparatus of claim 22, wherein the conversion processing unit further fuses the target value or the boundary value of the value range of the at least one element by using a dimension reduction algorithm.
24. The apparatus of claim 23, wherein the dimension reduction algorithm comprises a z-order curve algorithm.
25. The apparatus according to any one of claims 22 to 24, wherein the first determining unit further performs a hash operation on the target identifier value or a boundary value of the identifier value range to obtain a target primary key value or a boundary value of a primary key value range.
26. The apparatus according to any one of claims 20 to 24, wherein the data query request further includes an extended query condition, wherein the extended query condition is used to indicate a value condition of an optional element, which is an element used to characterize the service other than the plurality of elements;
the query unit further queries the data stored in the at least one bucket based on the base query condition and the extended query condition to obtain a query result.
27. A computing device, comprising:
at least one processor;
a memory in communication with the at least one processor having executable code stored thereon, which when executed by the at least one processor causes the at least one processor to implement the method of any one of claims 1 to 6.
28. A computing device, comprising:
at least one processor;
a memory in communication with the at least one processor having executable code stored thereon, which when executed by the at least one processor causes the at least one processor to implement the method of any one of claims 7 to 13.
29. A machine readable storage medium storing executable code that when executed causes a machine to perform the method of any of claims 1 to 6.
30. A machine readable storage medium storing executable code that when executed causes a machine to perform the method of any of claims 7 to 13.
CN202010497934.6A 2020-06-04 2020-06-04 Data storage and query method, device and machine-readable storage medium Pending CN111552695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497934.6A CN111552695A (en) 2020-06-04 2020-06-04 Data storage and query method, device and machine-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497934.6A CN111552695A (en) 2020-06-04 2020-06-04 Data storage and query method, device and machine-readable storage medium

Publications (1)

Publication Number Publication Date
CN111552695A true CN111552695A (en) 2020-08-18

Family

ID=72006851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497934.6A Pending CN111552695A (en) 2020-06-04 2020-06-04 Data storage and query method, device and machine-readable storage medium

Country Status (1)

Country Link
CN (1) CN111552695A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949665A (en) * 2020-08-21 2020-11-17 支付宝(杭州)信息技术有限公司 Method and apparatus for data processing
CN112015738A (en) * 2020-08-28 2020-12-01 支付宝(杭州)信息技术有限公司 Method and device for realizing linked list processing of multiple data detail lists
CN112100226A (en) * 2020-09-18 2020-12-18 腾讯科技(深圳)有限公司 Data query method and computer-readable storage medium
CN113393296A (en) * 2021-06-16 2021-09-14 北京沃东天骏信息技术有限公司 Data relationship representation method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402394A (en) * 2010-09-13 2012-04-04 腾讯科技(深圳)有限公司 Hash algorithm-based data storage method and device
CN105404652A (en) * 2015-10-29 2016-03-16 河海大学 Mass small file processing method based on HDFS
WO2018103315A1 (en) * 2016-12-09 2018-06-14 上海壹账通金融科技有限公司 Monitoring data processing method, apparatus, server and storage equipment
US20180285542A1 (en) * 2015-10-20 2018-10-04 Grg Banking Equipment Co., Ltd. Method and device for authenticating identify by means of fusion of multiple biological characteristics
CN110019509A (en) * 2017-09-29 2019-07-16 北京京东尚科信息技术有限公司 A kind of wiring method and device of data
CN110245684A (en) * 2019-05-14 2019-09-17 杭州米雅信息科技有限公司 Data processing method, electronic equipment and medium
WO2020019749A1 (en) * 2018-07-24 2020-01-30 华为技术有限公司 Data partitioning method, related device, and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402394A (en) * 2010-09-13 2012-04-04 腾讯科技(深圳)有限公司 Hash algorithm-based data storage method and device
US20180285542A1 (en) * 2015-10-20 2018-10-04 Grg Banking Equipment Co., Ltd. Method and device for authenticating identify by means of fusion of multiple biological characteristics
CN105404652A (en) * 2015-10-29 2016-03-16 河海大学 Mass small file processing method based on HDFS
WO2018103315A1 (en) * 2016-12-09 2018-06-14 上海壹账通金融科技有限公司 Monitoring data processing method, apparatus, server and storage equipment
CN110019509A (en) * 2017-09-29 2019-07-16 北京京东尚科信息技术有限公司 A kind of wiring method and device of data
WO2020019749A1 (en) * 2018-07-24 2020-01-30 华为技术有限公司 Data partitioning method, related device, and computer storage medium
CN110245684A (en) * 2019-05-14 2019-09-17 杭州米雅信息科技有限公司 Data processing method, electronic equipment and medium

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
刘俊龙;刘光明;张黛;喻杰;: "基于Redis的海量互联网小文件实时存储与索引策略研究" *
刘俊龙;刘光明;张黛;喻杰;: "基于Redis的海量互联网小文件实时存储与索引策略研究", 计算机研究与发展 *
张榆;马友忠;孟小峰;: "一种基于HBase的高效空间关键字查询策略" *
张榆;马友忠;孟小峰;: "一种基于HBase的高效空间关键字查询策略", 小型微型计算机系统 *
徐爱萍;王波;张煦;: "基于HBASE的时空大数据关联查询优化" *
徐爱萍;王波;张煦;: "基于HBASE的时空大数据关联查询优化", 计算机应用与软件 *
朱夏;罗军舟;宋爱波;东方;: "云计算环境下支持复杂查询的多维数据索引机制" *
朱夏;罗军舟;宋爱波;东方;: "云计算环境下支持复杂查询的多维数据索引机制", 计算机研究与发展 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949665A (en) * 2020-08-21 2020-11-17 支付宝(杭州)信息技术有限公司 Method and apparatus for data processing
CN111949665B (en) * 2020-08-21 2023-12-22 支付宝(杭州)信息技术有限公司 Method and device for data processing
CN112015738A (en) * 2020-08-28 2020-12-01 支付宝(杭州)信息技术有限公司 Method and device for realizing linked list processing of multiple data detail lists
CN112100226A (en) * 2020-09-18 2020-12-18 腾讯科技(深圳)有限公司 Data query method and computer-readable storage medium
CN113393296A (en) * 2021-06-16 2021-09-14 北京沃东天骏信息技术有限公司 Data relationship representation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111552695A (en) Data storage and query method, device and machine-readable storage medium
CN106874348B (en) File storage and index method and device and file reading method
CN108932257B (en) Multi-dimensional data query method and device
EP3945436A1 (en) Methods, apparatuses, and devices for generating digital document of title
CN110297831A (en) A kind of block chain fragment storage method based on threshold secret sharing
CN111274045A (en) Multi-platform docking method and device, computer equipment and readable storage medium
CN103942292A (en) Virtual machine mirror image document processing method, device and system
CN111372242B (en) Fraud identification method, fraud identification device, server and storage medium
CN111985921B (en) Verification processing method based on block chain offline payment and digital financial service platform
CN111443899B (en) Element processing method and device, electronic equipment and storage medium
CN111221982B (en) Information processing method, information processing apparatus, computer readable storage medium, and computer device
CN105450412A (en) Identity authentication method and device
CN110992039B (en) Transaction processing method, device and equipment
CN109271564A (en) Declaration form querying method and equipment
CN115033599B (en) Graph query method, system and related device based on multi-party security
CN108829882B (en) Information collection method, device, terminal and medium
CN115795544A (en) File security attribute storage method and related device
CN114495180A (en) Fingerprint matching method, chip and equipment
JP5490859B2 (en) Visual keyword extraction device, BoF expression generation device using the same, and visual keyword extraction method
CN113285933A (en) User access control method and device, electronic equipment and storage medium
CN111949665B (en) Method and device for data processing
CN117234442B (en) Data printing method, device, computer equipment and computer readable storage medium
CN113282542B (en) Verifiable searchable encryption method, device and equipment with forward security
CN111339566B (en) Block summarization method, device, computer equipment and storage medium
CN112800240A (en) Word stock updating method, identity recognition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230110

Address after: 200120 Floor 15, No. 447, Nanquan North Road, Free Trade Pilot Zone, Pudong New Area, Shanghai

Applicant after: Alipay.com Co.,Ltd.

Address before: 310000 801-11 section B, 8th floor, 556 Xixi Road, Xihu District, Hangzhou City, Zhejiang Province

Applicant before: Alipay (Hangzhou) Information Technology Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20200818

RJ01 Rejection of invention patent application after publication