WO2016004813A1 - 数据存储方法、查询方法及设备 - Google Patents

数据存储方法、查询方法及设备 Download PDF

Info

Publication number
WO2016004813A1
WO2016004813A1 PCT/CN2015/081651 CN2015081651W WO2016004813A1 WO 2016004813 A1 WO2016004813 A1 WO 2016004813A1 CN 2015081651 W CN2015081651 W CN 2015081651W WO 2016004813 A1 WO2016004813 A1 WO 2016004813A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
query
combination
information table
identification
Prior art date
Application number
PCT/CN2015/081651
Other languages
English (en)
French (fr)
Inventor
储晓颖
Original Assignee
阿里巴巴集团控股有限公司
储晓颖
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 储晓颖 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2017500353A priority Critical patent/JP6744854B2/ja
Priority to EP15819546.1A priority patent/EP3168758A1/en
Priority to US15/324,661 priority patent/US10489372B2/en
Publication of WO2016004813A1 publication Critical patent/WO2016004813A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present invention relates to the field of network technologies, and in particular, to a data storage method, a query method, and a device.
  • NOSQL non-relational databases
  • RDBMS relational databases
  • aspects of the present invention provide a data storage method, a query method, and a device for improving data storage and query speed and improving performance of a storage system.
  • An aspect of the present invention provides a data storage method, including:
  • the data record When the data record includes a timestamp, a value generated at a time point identified by the timestamp, and a first identification value combination that can identify the value, the data record is preprocessed according to a query requirement of the user, Obtaining a target value and a target time that meet the query requirement and a second identification value combination as a query condition, and storing, to the second, the storage identifier, the target time, and the target value that characterize the second identification value combination In the information table, and storing the first identification value combination to the first In an information table;
  • a value of the time-independent multi-dimensional identification field constitutes the first identification value combination.
  • a data storage device comprising:
  • a receiving module configured to receive a data record to be stored
  • a first storage module configured to: when the data record includes a timestamp, a value generated at a time point identified by the timestamp, and a first identification value combination that can identify the value, according to a query requirement of the user Determining the data record, obtaining a target value and a target time satisfying the query requirement, and a second identification value combination as a query condition, and characterizing the storage identifier of the second identification value combination, the target time, and the The target value is stored in the second information table, and the first identification value combination is stored in the first information table;
  • a value of the time-independent multi-dimensional identification field constitutes the first identification value combination.
  • a data query method including:
  • the query is performed in the plurality of first identifier value combinations stored in the first information table according to the filter condition, and the first condition that satisfies the filter condition is acquired.
  • An identification value combination wherein a value of the time-independent multi-dimensional identification field constitutes a first identification value combination, and the filtering condition includes a value of the partial identification field;
  • the query is performed in the second information table according to the storage identifier that represents the combination of the second identification value and the target time, and the corresponding information is obtained.
  • the target time and the target value of the stored identifier are the same.
  • a data query device including:
  • a receiving module configured to receive a query request
  • a first querying module configured to: when the query request includes a filtering condition that is a query condition but does not include a target time, perform a query in the multiple first identification value combinations stored in the first information table according to the filtering condition, and obtain a first identification value combination that satisfies the filtering condition; wherein a value of the time-independent multi-dimensional identification field constitutes a first identification value combination, and the filtering condition includes a value of the partial identification field;
  • a second query module configured to: when the query request includes the second identification value combination and the target time as the query condition, according to the storage identifier that represents the combination of the second identification value and the target time in the second information table A query is performed to obtain a target value corresponding to the target time and the storage identifier.
  • the time-independent content such as the value of the multi-dimensional identification field
  • the first information table stores Time-independent content, relatively small amount of data, greatly reduced the workload of creating and maintaining secondary indexes; for time-related content, preprocessing the data records according to the user's query requirements directly to meet the query requirements of the user
  • the information that is, the target value, the target time, and the storage identifier that characterizes the query condition, can reduce the amount of data in the second information table to some extent by preprocessing, and further make the second information table not by storing the target value, the target time, and the storage identifier.
  • a secondary index needs to be established.
  • the technical solution of the present invention greatly reduces the workload of creating and maintaining the secondary index, and the amount of stored data is also reduced, thereby improving the data storage speed and improving the performance of the storage system;
  • the second information table is not directly dependent on the secondary index, which is beneficial to improve the query speed.
  • the secondary information index maintained by the first information table is small. Therefore, the query speed is also improved compared with the prior art.
  • FIG. 1a is a schematic flowchart of a data storage method according to an embodiment of the present invention.
  • FIG. 1b is a schematic flowchart of a data storage method according to another embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a data query method according to an embodiment of the present invention.
  • 3a is a schematic structural diagram of a data storage device according to an embodiment of the present invention.
  • 3b is a schematic structural diagram of a data storage device according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a data query device according to an embodiment of the present invention.
  • Table 1 is a common application scenario in the field of data analysis.
  • the content in Table 1 is a record of the various behaviors of the third party payment company transaction system.
  • interfaces and methods are commonly used service identifiers in the field of Service-Oriented Architecture (SOA).
  • SOA Service-Oriented Architecture
  • Interfaces represent a certain service.
  • Methods represent a specific behavior under this service, create and pay. Representing the order creation service and order payment service under this service respectively; the source indicates whether the caller of this service is from Taobao (Taobao website) or Tmall (Tmall website).
  • Y/N represents the final of this business.
  • the result is success or failure; the amount is a number indicating the amount of the transaction.
  • FIG. 1a is a schematic flowchart of a data storage method according to an embodiment of the present invention. As shown in FIG. 1a, the method includes:
  • the data record is preprocessed according to a query requirement of the user, and the query is satisfied.
  • the query is satisfied.
  • Data record refers to a complete set of related information in the data source. Take Table 1 as an example. One row of data in Table 1 is a data record.
  • the data record in this embodiment may come from, but is not limited to, a business system.
  • the business system here can be a business system in any field, such as a commodity trading business system, a banking system, a toll station management business system, and the like.
  • data records are generally generated, and these data records generally need to be stored.
  • the commodity transaction information such as the product name, transaction time, transaction amount, and product provider
  • one commodity transaction information is a data record.
  • a bank transfers or remittances
  • it records information related to transfer or remittance, such as transfer or remittance account, payment account number, transfer or remittance amount, transfer or remittance date, etc., a bank transfer or remittance record. It is a data record.
  • the data storage device can receive the data record to be stored sent by the service system.
  • the data records generated may include time-related content, and may also include time-independent content.
  • time and amount change over time, and for “interface”, “method”, “source” and “result”, the change does not persist over time.
  • the "time” and “amount” in Table 1 correspond to the timestamp and the numerical value in this embodiment, respectively; the "interface”, “method”, “source” and “result” in Table 1 correspond to the multi-dimensional identifier in this embodiment.
  • the trading time and the amount of the transaction will change over time. , generally accumulates over time, but once the commodity name, commodity provider, etc. is determined not to continue to grow over time, unless there is a new commodity or a new commodity provider, such as low probability The event appeared.
  • the time of transfer or remittance and the amount of the transfer or remittance will change over time, generally accumulating over time, but for banks, bank addresses, transfers or remittance accounts, Once the payment account or the like is determined not to continue to grow over time, unless there is a low-probability event such as a bank change address or a new user opening an account at the bank.
  • the time-related content in the data record is generally the time when the service occurs and the value generated by the service at the time.
  • the time when the service occurs is recorded as a time stamp, which will be in the timestamp.
  • the value generated by the business at the identified time point is recorded as the value generated at the time point identified by the time stamp.
  • the timestamp is the time at which the commodity transaction is generated, and the value generated at the point in time identified by the timestamp is the commodity transaction amount.
  • the timestamp is the point in time at which the transfer or remittance occurs, and the value generated at the time point identified by the timestamp is the amount of the transfer or remittance.
  • the time-independent content of the data record generally refers to a certain value of the multi-dimensional identification field whose change frequency is low and can identify the value generated at the time point identified by the time stamp.
  • the value of the identifier field may be referred to as an identifier field value
  • the combination of the values of the multi-dimensional identifier field may be referred to as a first identity value combination.
  • a transaction is typically uniquely identified by the value of a field such as a product name, a product provider, and the like.
  • the transfer or remittance service can be uniquely identified by the value of the bank name, transfer or remittance account and payment account.
  • the data storage device may determine the content included in the data record, and when determining that the data record includes a timestamp, a value generated at a time point identified by the timestamp, and a first number that can identify the value
  • the identification value is combined, the foregoing data record is preprocessed according to the query requirement of the user, and the target value and the target time satisfying the query requirement and the second identification value combination as the query condition are obtained, and the storage identifier of the second identification value combination is characterized.
  • Target time and purpose The target value is stored in the second information table, and the first identification value combination is stored in the first information table.
  • the first identification value combination already exists in the first information table, the first identification value combination that has existed before may be directly covered; if the first identification value combination does not exist in the first information table, The first identification value combination is stored in the first information table.
  • the second identification value combination refers to a value of the multi-dimensional identification field as a query condition at the time of the query.
  • the combination of the second identification value and the first identification value may correspond to the same number of identification fields, and may also correspond to different number of identification fields.
  • the number of the identifier fields corresponding to the second identifier value combination should be less than or equal to the number of the identifier fields corresponding to the first identifier value combination.
  • the second identifier value combination may be one value of two identifier fields, "interface” and "method”, or may be one of three identifier fields: "interface", "method” and "source”. Kind of value, and so on.
  • a classification rule for classifying content in the data record is pre-configured on the data storage device, and the data storage device may classify the content in the received data record based on the classification rule.
  • the classification rule may directly specify that the timestamp and numerical field in the data record are time-related content, and the other field content as time-independent content.
  • the data storage device classifies the content in the data record, one is time-related content, such as a timestamp and a numerical value, and the other is time-independent content, such as the value of the multi-dimensional identification field. That is, the first identification value is combined, and the time-independent content and the time-related content are separately classified and stored through the first information table and the second information table.
  • the first information table stores time-independent content, the amount of data is relatively small, and the workload of creating and maintaining the secondary index is greatly reduced; correspondingly, when the first information table needs to be queried, the first information is The table maintains fewer secondary indexes and the query speed is also improved.
  • the data storage device can know the query requirement of the user in advance, and the query requirement of the user herein refers to a query that may occur after the data record is stored.
  • User query requirements will vary for different business systems. Once the business system corresponding to the data record is determined, the user's query requirements are generally determined. For example, for a commodity trading business system, the user may need to find the transaction amount involved in each transaction in which the goods provided by the specified commodity provider are purchased, or the sum of the transaction amounts involved in the commodity transaction occurring in the specified time period, or To find the transaction amount involved in each transaction in which the specified item was purchased within the specified time period, and so on.
  • the specified commodity provider, the specified time period, the specified commodity, and the like involved in the above examples are the query conditions in the user's query request.
  • the data storage device first pre-processes the data record according to the query requirement of the user, and obtains information that satisfies the query requirement of the user, that is, the target value, the target time, and the query condition.
  • the second identification value combination is combined and the second identification value combination is replaced by a storage identification that can characterize the second identification value combination.
  • the second information table stores three types of information: the storage identifier, the target time, and the target value. It can be seen that the second information table is only more than the pure key-value pair table in the prior art.
  • Target time the target time can be used as the primary key of the second information table, and the other two information do not need to establish a secondary index, and the second information table not only reduces the amount of data, but also does not need to establish and maintain a secondary index;
  • the information stored in the second information table is the information that has met the query requirement. In the query process, after the query request is received, no calculation is needed, and the query result can be directly obtained, which is beneficial to improve the query speed.
  • the above target value may be the value in the data record, or may be obtained by processing the value in the data record according to the query requirement.
  • the target value is the value in the data record; if the query demand is to query the sum of the values in the data records generated in each cycle according to the preset period, the target value This is the sum of the values in all data records in each cycle.
  • the above target time may be a timestamp in the data record, or may be a time related to the timestamp determined according to the query requirement.
  • the target time is the timestamp in the data record; if the query demand is to query the sum of the values in the data records generated in each cycle according to the preset period, then the target Time is the time point corresponding to each query cycle.
  • the second identification value combination is equivalent to the first identification value combination; if the query condition only includes the multi-dimensional The value of the partial identifier field in the identifier field is only the value of the partial identifier field, and is not equivalent to the first identifier value combination.
  • FIG. 1b is a schematic flowchart of a data storage method according to another embodiment of the present invention. As shown in Figure 1b As shown, the method includes:
  • step 1b judging the content included in the foregoing data record, if the timestamp is included, the value generated at the time point identified by the timestamp, and the first identification value combination that can identify the value, step 1c is performed; if the timestamp is not included At the point in time indicated by the timestamp, but including the first identification value combination that can identify the value, step 1d is performed.
  • the value of the multi-dimensional identification field may change, for example, "interface”, “method”, “source”, and “result” in Table 1. It may change due to changes in business rules. Therefore, after receiving the data record, it is determined whether the data record includes a timestamp and a value generated at the time point identified by the timestamp, if the data record does not include the timestamp and the timestamp.
  • the value generated by the identified time point indicates that the data record includes time-independent content, that is, the first identification value combination, and the first identification value combination needs to be stored in the first information table, since no time-related is involved. The content, so there is no need to operate on the second information table.
  • the method provided in this embodiment can adapt to various storage requirements of the user.
  • the manner of storing the first identification value combination in the first information table in step 102 or step 1c or step 1d includes:
  • the first write request includes a first identification value combination.
  • the first write request is sent to the first device where the first information table is located, and the first identifier combination is carried in the first write request.
  • the first device After receiving the first write request, the first device obtains the first identity value combination, and queries whether the first identity value combination already exists in the first information table, and if the first identity value combination already exists, the first device is ignored. Writing a first identification value combination carried in the request, if the first identification value combination does not exist, writing the first identification value combination into the first information table.
  • step 102 or step 1c storing, in step 102 or step 1c, the storage identifier, the target time, and the target value that represent the combination of the second identification values are stored in the second information table, including:
  • the second write request includes a second identification value combination, a target time, and a target value.
  • the second write request is sent to the second device where the second information table is located, and the second identification value combination, the target time, and the target value are carried in the second write request.
  • the second device After receiving the second write request, the second device obtains the second identifier value combination, the target time, and the target value from the second write request, and then uniquely maps the second identifier value combination to a storage identifier, and stores the identifier and the target.
  • the time and target values are correspondingly stored in the second information table.
  • the storage identifier occupies the characterization of the second identification value combination to be much smaller than the second identification value combination, so storing the storage identifier characterizing the second identification value combination is advantageous for saving storage space and facilitating retrieval.
  • the first information table and the second information table may be stored on the same device, or may be separately stored on different devices. That is, the first device and the second device may be the same device or different devices.
  • the structure of the data stored in the two information tables is different, the implementation structures of the two information tables are different, and therefore, they can be stored in different devices.
  • the first information table of this embodiment may be referred to as a dimension table.
  • the second information table may be referred to as a record table, but is not limited thereto.
  • Table 2 is the first information table
  • Table 3 is the second information table.
  • the value combination of "interface”, “method”, “source” and “result” is stored in Table 2, that is, the first identification value combination.
  • the number of data rows in Table 2 is much smaller than the number of data rows in Table 1, so the workload of creating and maintaining secondary indexes for Table 2 is much smaller, which greatly reduces the impact on the storage system and helps to improve. Storage efficiency.
  • Table 3 Stored in Table 3 are target values (corresponding to "amount” in Table 3), target time (corresponding to "time” in Table 3), and storage identifiers obtained after pre-processing to satisfy the user's query requirements.
  • the total amount of the second identification value combination (ie, the query condition) represented by ID3 [
  • the two columns of storage identifier and amount constitute a key-value pair, which does not need to be indexed, and the time column can be used as a primary key, that is, Table 3 does not need to establish a secondary index, further reducing the storage system.
  • Table 3 does not need to establish a secondary index, further reducing the storage system. The impact of this is conducive to improving storage efficiency.
  • the method provided in this embodiment stores the content in the data record to be stored, and stores the time-independent content, such as the value of the multi-dimensional identification field, into the first information table, so that the first information table is in the first information table.
  • time-independent content the amount of data is relatively small, and the workload of creating and maintaining secondary indexes is greatly reduced; for time-related content, data records are preprocessed according to user query requirements to directly satisfy users.
  • the information of the query demand that is, the target value, the target time, and the storage identifier that characterizes the query condition, can reduce the amount of data in the second information table to some extent by preprocessing, and additionally by storing the target value, the target time, and the storage identifier.
  • the second information table does not need to establish a secondary index.
  • the workload of creating and maintaining the secondary index in this embodiment is greatly reduced, and the amount of stored data is also reduced, so that the data storage speed can be improved and the performance of the storage system can be improved;
  • the second information table is not directly dependent on the secondary index, which is beneficial to improve the query speed. Even if the first information table needs to be queried, since the first information table maintains fewer secondary indexes, The query speed is also improved compared to the prior art.
  • the second information table stores the statistical results of the complete dimension instead of storing the original transaction records one by one, but directly storing the results required by the query.
  • the storage identifier is directly retrieved, and no secondary index is needed, which is beneficial to improve the query speed.
  • the data provided by the present invention is combined with the data storage method provided by the above embodiment of the present invention.
  • the flow of the query method is explained.
  • FIG. 2 is a schematic flowchart diagram of a data query method according to an embodiment of the present invention. As shown in Figure 2, the method includes:
  • querying is performed in the multiple first identifier value combinations stored in the first information table according to the filtering condition, and acquiring the first that meets the filtering condition.
  • a combination of identification values wherein a value of the time-independent multi-dimensional identification field constitutes a first identification value combination, and the filtering condition includes a value of the partial identification field.
  • the query request includes the second identifier value combination and the target time as the query condition, querying, according to the storage identifier that identifies the second identifier value combination and the target time, in the second information table, and acquiring the target corresponding to the foregoing target The target value of the time and storage identifier.
  • a query request may be sent to an execution entity of the embodiment, such as a data query device, where the query request includes information required to perform the query.
  • the data query device receives the query request and determines the content included in the query request. When it is determined that the query request includes the filter condition but does not include the target time, the query request is used to query the value of the multi-dimensional identifier field that satisfies the filter condition, and then multiple firsts stored in the first information table directly according to the filter condition. A query is performed in the combination of the identification values to obtain a first identification value combination that satisfies the filtering condition.
  • the first identification value combination that satisfies the filtering condition may be one or more.
  • the filter condition contains the value of the partial identification field.
  • the query request is used to query the value corresponding to the target time and the second identification value combination, and then directly according to the second identification value combination
  • the storage identifier and the target time are queried in the second information table, and the target value corresponding to the target time and the storage identifier that represents the combination of the second identification value is obtained.
  • the second information table stores storage identifiers, target times, and target values that characterize the combination of the second identification values.
  • the query is performed in the first identifier combination that is stored in the first information table according to the filter condition, and the first identifier combination that meets the filter condition is obtained, including:
  • the first read request includes a filter condition.
  • the data querying device sends a first read request to the first device, and carries the filtering condition in the first read request.
  • the first device receives the first read request, obtains a filtering condition from the network, performs a search in the first information table according to the filtering condition, and acquires a first identification value combination that satisfies the filtering condition.
  • the query is performed in the second information table according to the storage identifier and the target time that are combined with the second identifier value, and the target value corresponding to the target time and the storage identifier is obtained, including:
  • the data querying device sends a second read request to the second device, and carries the second identity value combination and the target time in the second read request.
  • the embodiment relates to other information of the first information table and the second information table, and other nouns (for example, the first identification value combination, the second identification value combination, etc.) explanation or description can be seen in FIG. 1a. Description in the embodiment.
  • the second information table is directly queried, and no secondary index is needed, which is beneficial to improve the query speed.
  • the query identifier value is combined, the first information table is directly queried. Since the first information table has fewer data rows and fewer secondary indexes, the query speed can be improved as compared with the prior art.
  • FIG. 3 is a schematic structural diagram of a data storage device according to an embodiment of the present invention. As shown in FIG. 3a, the device includes a receiving module 31 and a first storage module 32.
  • the receiving module 31 is configured to receive a data record to be stored.
  • the first storage module 32 is connected to the receiving module 31, and is configured to: when the data record received by the receiving module 31 includes a timestamp, a value generated at a time point identified by the timestamp, and a first identification value combination that can identify the value. And preprocessing the data record according to the query requirement of the user, obtaining a target value and a target time satisfying the query requirement, and a second identifier value combination as a query condition, and characterizing the storage identifier, the target time, and the second identifier value combination The target value is stored in the second information table, and the first identification value combination is stored in the first information table.
  • a value of the time-independent multi-dimensional identification field constitutes the first identification value combination.
  • the device further includes: a second storage module 33.
  • the second storage module 33 is connected to the receiving module 31, and the data record received by the receiving module 31 does not include a timestamp and a value generated at a time point identified by the timestamp, but includes a first identification value combination that can identify the value.
  • the first identification value combination is stored in the first information table.
  • the second storage module 33 is specifically configured to: when the data record received by the receiving module 31 does not include a timestamp and a value, but includes the first identification value combination, to the first device where the first information table is located Sending a first write request, so that the first device writes the first identifier value combination into the first information table when determining that the first identifier value combination does not exist in the first information table, where the first write request includes the first identifier value combination.
  • the first storage module 32 is specifically configured to preprocess the data record according to the query requirement of the user when the data record received by the receiving module 31 includes a timestamp, a value, and a first identification value combination.
  • a request is written to cause the first device to store the first identification value combination into the first information table when determining that the first identification value combination does not exist in the first information table.
  • the first write request herein includes the first identification value combination described above, and the second write request includes the second identification value combination, the target time, and the target value.
  • the function modules of the data storage device provided in this embodiment may be used to execute the process of the data storage method shown in FIG. 1a or FIG. 1b.
  • the specific working principle is not described here. For details, refer to the description of the method embodiment.
  • the data storage device stores time-related data, such as a timestamp and a value, into the second information table by classifying the data to be stored in the data record, so that the first information table stores the Time-independent content, relatively small amount of data, greatly reduced the workload of creating and maintaining secondary indexes; for time-related content, preprocessing the data records according to the user's query requirements directly to meet the query requirements of the user
  • the information that is, the target value, the target time, and the storage identifier that characterizes the query condition, can reduce the amount of data in the second information table to some extent by preprocessing, and further make the second information table not by storing the target value, the target time, and the storage identifier.
  • a secondary index needs to be established. Compared with the prior art, the workload of the data storage device of the embodiment for creating and maintaining the secondary index is greatly reduced, and the amount of stored data is also reduced, thereby improving the data storage speed and improving the performance of the storage system. .
  • FIG. 4 is a schematic structural diagram of a data query device according to an embodiment of the present invention. As shown in FIG. 4, the data query device includes: a receiving module 41, a first query module 42, and a second query module 43.
  • the receiving module 41 is configured to receive a query request.
  • the first query module 42 is connected to the receiving module 41, and is configured to: when the query request received by the receiving module 41 includes the filtering condition as the query condition but does not include the target time, the plurality of first stored in the first information table according to the filtering condition.
  • the identifier value combination is queried to obtain a first identification value combination that satisfies the filtering condition; wherein the value of the multi-dimensional identification field that is independent of time constitutes a first identification value combination, and the filtering condition includes the value of the partial identification field.
  • the second query module 43 is connected to the receiving module 41, and is configured to represent the second identifier value group when the query request received by the receiving module 41 includes the second identifier value combination and the target time as the query condition.
  • the stored storage identifier and the target time are queried in the second information table, and the target value corresponding to the target time and the storage identifier is obtained.
  • the first query module 42 is specifically configured to send, when the query request received by the receiving module 41 includes the filtering condition as the query condition but does not include the target time, send the first device to the first device where the first information table is located.
  • the read request is such that the first device reads the first identification value combination that satisfies the filter condition from the first information table.
  • the first read request includes a filter condition.
  • the second query module 43 is specifically configured to send, when the query request received by the receiving module 41 includes the second identity value combination and the target time as the query condition, to the second device where the second information table is located. a second read request, so that the second device determines a storage identifier that can represent the second identification value combination, and reads, from the second information table, a storage identifier corresponding to the second identification value combination and a target value corresponding to the target time; wherein, The second read request includes a second identification value combination and a target time.
  • the function modules of the data query device provided in this embodiment can be used to execute the process of the method embodiment shown in FIG. 2, and the specific working principle is not described here. For details, refer to the description of the method embodiment.
  • the data query device provided in this embodiment cooperates with the data storage device provided in the foregoing embodiment.
  • the second information table is directly queried, and no secondary index is needed, which is beneficial to When the query ID value is combined, the first information table is directly queried. Since the first information table has fewer data rows and fewer secondary indexes, the query speed can be improved as compared with the prior art.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • Another point that is shown or discussed between each other The coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the above software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods of the various embodiments of the present invention. Part of the steps.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种数据存储方法、查询方法及设备。存储方法包括:接收待存储的数据记录;在数据记录包括时间戳、在时间戳所标识的时间点产生的数值以及可以标识数值的第一标识值组合时,根据用户的查询需求对数据记录进行预处理,获得满足查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征第二标识值组合的存储标识、目标时间和目标数值对应存储到第二信息表中,并将第一标识值组合存储到第一信息表中。本发明可以降低创建和维护次级索引的工作量,有利于提高数据存储和查询速度,提高存储系统的性能。

Description

数据存储方法、查询方法及设备 【技术领域】
本发明涉及网络技术领域,尤其涉及一种数据存储方法、查询方法及设备。
【背景技术】
随着非关系型数据库(NOSQL)在业界的推广,传统的关系型数据库(Relational Database Management System,RDBMS)受到很大挑战。虽然NOSQL支持键值对(key-value)存储方式,但在很多场景下依然会像RDBMS那样定义表(table),为表设计多个列(column),为除主键以外的其他列创建次级索引;之后SQL会像使用RDBMS那样使用该NOSQL产品。
当表中数据行较多时,为表中的多个列创建并维护次级索引会严重影响存储系统的性能,导致存储和查询速度较低。
【发明内容】
本发明的多个方面提供一种数据存储方法、查询方法及设备,用以提高数据存储和查询速度,提高存储系统的性能。
本发明的一方面,提供一种数据存储方法,包括:
接收待存储的数据记录;
在所述数据记录包括时间戳、在所述时间戳所标识的时间点产生的数值以及可以标识所述数值的第一标识值组合时,根据用户的查询需求对所述数据记录进行预处理,获得满足所述查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征所述第二标识值组合的存储标识、所述目标时间和所述目标数值对应存储到第二信息表中,并将所述第一标识值组合存储到第 一信息表中;
其中,与时间无关的多维标识字段的一种取值构成所述第一标识值组合。
本发明的另一方面,提供一种数据存储设备,包括:
接收模块,用于接收待存储的数据记录;
第一存储模块,用于在所述数据记录包括时间戳、在所述时间戳所标识的时间点产生的数值和可以标识所述数值的第一标识值组合时,根据用户的查询需求对所述数据记录进行预处理,获得满足所述查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征所述第二标识值组合的存储标识、所述目标时间和所述目标数值存储到第二信息表中,并将所述第一标识值组合存储到第一信息表中;
其中,与时间无关的多维标识字段的一种取值构成所述第一标识值组合。
本发明的又一方面,提供一种数据查询方法,包括:
接收查询请求;
在所述查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据所述过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足所述过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,所述过滤条件包括部分标识字段的取值;
在所述查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征所述第二标识值组合的存储标识和所述目标时间在第二信息表中进行查询,获取对应于所述目标时间和所述存储标识的目标数值。
本发明的又一方面,提供一种数据查询设备,包括:
接收模块,用于接收查询请求;
第一查询模块,用于在所述查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据所述过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足所述过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,所述过滤条件包含部分标识字段的取值;
第二查询模块,用于在所述查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征所述第二标识值组合的存储标识和所述目标时间在第二信息表中进行查询,获取对应于所述目标时间和所述存储标识的目标数值。
在本发明技术方案中,通过对待存储数据记录中的内容进行分类,将与时间无关的内容,例如多维标识字段的取值存储到第一信息表中,这样第一信息表中存储的是与时间无关的内容,数据量相对较少,创建和维护次级索引的工作量大大降低了;对于与时间有关的内容,根据用户的查询需求对数据记录进行预处理直接获得满足用户的查询需求的信息,即目标数值、目标时间和表征查询条件的存储标识,通过预处理一定程度上可以减少第二信息表中的数据量,另外通过存储目标数值、目标时间和存储标识使得第二信息表不需要建立次级索引。与现有技术相比,本发明技术方案创建和维护次级索引的工作量极大的降低了,并且存储的数据量也有所降低,因此可以提高数据存储速度,提高存储系统的性能;相应的,在进行数据查询时,直接查询第二信息表时不再依赖次级索引,有利于提高查询速度,即使需要查询第一信息表,但由于第一信息表所维护的次级索引较少,所以与现有技术相比查询速度也有所提高。
【附图说明】
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1a为本发明一实施例提供的数据存储方法的流程示意图;
图1b为本发明另一实施例提供的数据存储方法的流程示意图;
图2为本发明一实施例提供的数据查询方法的流程示意图;
图3a为本发明一实施例提供的数据存储设备的结构示意图;
图3b为本发明另一实施例提供的数据存储设备的结构示意图;
图4为本发明一实施例提供的数据查询设备的结构示意图。
【具体实施方式】
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在对本发明技术方案进行说明之前,结合具体的应用场景说明一下现有存储方案存在的缺陷。
表1
Figure PCTCN2015081651-appb-000001
Figure PCTCN2015081651-appb-000002
表1是在数据分析领域常见的一种应用场景,表1中的内容是对第三方支付公司交易系统的各种各样行为的记录。在表1中,接口、方法就是面向服务的体系结构(Service-Oriented Architecture,SOA)领域中常用的服务标识,接口代表某种服务,方法代表这种服务下的一种具体行为,create和pay分别代表此服务下的订单创建业务和订单支付业务;来源表示此笔业务的调用方是来自Taobao(淘宝网站)还是Tmall(天猫网站),结果顾名思义Y/N分别代表了此笔业务的最终结果是成功还是失败;金额是一个数值,表示此笔交易的金额。
根据应用需求可以基于表1进行各种信息的查询。例如,可以查询2013-11-11 00:00这个时刻下,各种[接口,方法,来源,结果]组合下的金额分别是多少。又例如还可以查询2013-11-11 00:00这个时刻下,[接口=TradeFacade,方法=create或pay]的总金额。又例如还可以查询交易服务下的订单创建业务到底有多少种可能的来源,该来源在表1中对应的是Taobao和Tmall。
由上述可见,需要为表1中的接口、方法、来源、结果各列建立并维护次级索引,以避免查询过程中遍历全表。但是表1中的数据行会随着时间不断累积,当表1中数据行较多时,为多个列创建并维护次级索引将是一项代价巨大的工作,会严重影响存储系统的性能,降低存储和查询速度。
针对上述问题,图1a为本发明一实施例提供的数据存储方法的流程示意图,如图1a所示,该方法包括:
101、接收待存储的数据记录。
102、在上述数据记录包括时间戳、在时间戳所标识的时间点产生的数值和可以标识该数值的第一标识值组合时,根据用户的查询需求对上述数据记录进行预处理,获得满足查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征第二标识值组合的存储标识、目标时间和目标数值存储到第二信息表中,并将第一标识值组合存储到第一信息表中;其中,与时间无关的多维标识字段的一种取值构成第一标识值组合。
数据记录是指对应于数据源中一组完整的相关信息,以表1为例,表1中一行数据就是一条数据记录。本实施例中的数据记录可以来自但不限于业务系统。这里的业务系统可以是任何领域的业务系统,例如可以是商品交易业务系统、银行业务系统、收费站管理业务系统等等。
当业务系统有业务要处理时,一般会产生数据记录,这些数据记录一般需要进行存储。举例说明,当有商品交易时,会产生商品交易信息,例如商品名称、交易时间、交易金额、商品提供商等,一条商品交易信息即为一条数据记录。又例如,当银行发生转账或汇款等业务时,会记录与转账或汇款有关的信息,例如转账或汇款账号、收款账号、转账或汇款金额、转账或汇款日期等,一条银行转账或汇款记录就是一条数据记录。
基于上述,数据存储设备可以接收业务系统发送的待存储的数据记录。
无论是上面的商品交易业务系统、银行业务系统还是收费站管理业务系统,其产生的数据记录中可以包括与时间有关的内容,还可以包括与时间无关的内容。结合上述表1,其中“时间”和“金额”会随着时间的变化而变化,而对于“接口”、“方法”、“来源”和“结果”,其变化并不是随着时间流逝而持续增长的交易订单,而是源自频率较低的业务规则变化,例如接入了新的商家,或者提供了新的服务,例如理财服务。
表1中的“时间”和“金额”分别对应本实施例中的时间戳和数值;表1中的“接口”、“方法”、“来源”和“结果”对应本实施例中的多维标识字段,这些字段的一种取值可以唯一标识某个时间对应的金额。
对于商品交易来说,交易时间以及交易的金额会随着时间的变化而变 化,一般是随着时间的增加而累积,但是对于商品名称、商品提供商等一旦确定不会随着时间的流逝而持续增长,除非有新的商品出现或有新的商品提供商等低概率事件出现。对于银行系统来说,转账或汇款的时间以及转账或汇款的金额会随着时间的变化而变化,一般是随着时间的增加而累积,但是对于银行名称、银行地址、转账或汇款的账户、收款账户等一旦确定不会随着时间的流逝而持续增长,除非有银行变更地址或有新的用户在该银行开户等低概率事件的出现。
经过上述分析可以发现,数据记录中与时间有关的内容一般是业务发生的时间和在该时间因进行业务而产生的数值,本实施例中将业务发生的时间记为时间戳,将在时间戳所标识的时间点进行业务所产生的数值记为在该时间戳所标识的时间点产生的数值。例如,对于商品交易来说,时间戳就是产生商品交易的时间,而在该时间戳所标识的时间点产生的数值就是商品交易金额。又例如,对银行系统来说,时间戳就是转账或汇款发生的时间点,而在该时间戳所标识的时间点产生的数值就是转账或汇款的金额。
进一步发现,数据记录中与时间无关的内容一般是指变化频率较低且可以标识在上述时间戳所标识的时间点产生的数值的多维标识字段的某种取值。标识字段的取值可以称为标识字段值,多维标识字段的取值的组合可以称为第一标识值组合。对于相同的多维标识字段可以有多种取值,也就意味着可以有多个第一标识值组合。例如,对于商品交易来说,一般通过商品名称、商品提供商等字段的取值来唯一标识一次交易。又例如,对银行系统来说,可以通过银行名称、转账或汇款账号和收款账号等字段的取值唯一标识一次转账或汇款业务。
在接收到数据记录后,数据存储设备可以对数据记录包括的内容进行判断,当确定出该数据记录包括时间戳、在该时间戳所标识的时间点产生的数值以及可以标识该数值的第一标识值组合时,根据用户的查询需求对上述数据记录进行预处理,获得满足查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征第二标识值组合的存储标识、目标时间和目 标数值存储到第二信息表中,并将第一标识值组合存储到第一信息表中。值得说明的是,如果第一信息表中已经存在第一标识值组合,可以直接覆盖掉之前已经存在的第一标识值组合;如果第一信息表中不存在第一标识值组合,则直接将第一标识值组合存储到第一信息表中。
其中,第二标识值组合是指在查询时作为查询条件的多维标识字段的一种取值。第二标识值组合与第一标识值组合可能对应相同个数的标识字段,也可能对应不同个数的标识字段。其中,第二标识值组合对应的标识字段的个数应该小于或等于第一标识值组合对应的标识字段的个数。结合上述表1,第二标识值组合可以是“接口”、“方法”两个标识字段的一种取值,或者也可以是“接口”、“方法”和“来源”三个标识字段的一种取值,等等。
在本实施例中,数据存储设备上预先配置用于对数据记录中的内容进行分类的分类规则,数据存储设备可以基于该分类规则,对接收到的数据记录中的内容进行分类。例如,该分类规则可以直接规定将数据记录中的时间戳和数值字段作为与时间有关的内容,将其他字段内容作为与时间无关的内容。
在本实施例中,数据存储设备对数据记录中的内容进行分类,一类是与时间有关的内容,例如时间戳和数值,一类是与时间无关的内容,例如多维标识字段的取值,即第一标识值组合,并通过第一信息表和第二信息表分别对与时间无关的内容和与时间有关的内容进行分类存储。这样第一信息表中存储的是与时间无关的内容,数据量相对较少,创建和维护次级索引的工作量会大大降低;相应的,当需要查询第一信息表时,由于第一信息表所维护的次级索引较少,查询速度也会有所提高。
另外,数据存储设备可以预先获知用户的查询需求,这里所说的用户的查询需求是指在对数据记录进行存储之后,用户可能发生的查询。针对不同的业务系统,用户的查询需求会有所不同。一旦数据记录对应的业务系统确定,用户的查询需求一般也就确定了。例如,对于商品交易业务系统,用户可能需要查找所有购买了指定商品提供商提供的商品的各交易涉及的交易额,或者需要查找指定时间段发生的商品交易涉及的交易额之和,又或者需 要查找在指定时间段内购买了指定商品的各交易涉及的交易额,等等。上述举例中涉及的指定商品提供商、指定时间段、指定商品等即为用户查询需求中的查询条件。
基于上述,对于与时间有关的内容,在进行存储之前,数据存储设备先根据用户的查询需求对数据记录进行预处理,获得满足用户的查询需求的信息,即目标数值、目标时间以及作为查询条件的第二标识值组合,并通过可以表征第二标识值组合的存储标识代替第二标识值组合。经过上述处理,第二信息表中会存储三类信息:存储标识、目标时间和目标数值,由此可见,第二信息表与现有技术中纯粹的键值对表相比,仅多出了目标时间,该目标时间可以作为第二信息表的主键,另外两个信息不需要建立次级索引,第二信息表不仅数据量有所减少,而且不需要建立和维护次级索引;并且由于第二信息表中存储的是已经满足查询需求的信息,在查询过程中当接收到查询请求后无需进行计算,可以直接获得查询结果,有利于提高查询速度。
值得说明的是,上述目标数值可能是数据记录中的数值,也可能是根据查询需求对数据记录中的数值进行一定处理获得的。举例说明,如果查询需求是查询每条数据记录中数值,则目标数值就是数据记录中的数值;如果查询需求是按照预设周期查询每个周期内产生的数据记录中数值的总和,则目标数值就是每个周期内所有数据记录中数值的之和。另外,上述目标时间可能是数据记录中的时间戳,也可能是根据查询需求所确定的与时间戳有关的时间。举例说明,如果查询需求是查询每条数据记录中数值,则目标时间就是数据记录中的时间戳;如果查询需求是按照预设周期查询每个周期内产生的数据记录中数值的总和,则目标时间就是每个查询周期对应的时间点。
相应的,对于上述作为查询条件的第二标识值组合,如果查询条件包括了全部多维标识字段的取值,则第二标识值组合就等同于第一标识值组合;如果查询条件仅包括了多维标识字段中部分标识字段的取值,则第二标识值组合仅是部分标识字段的取值,并不等同于第一标识值组合。
图1b为本发明另一实施例提供的数据存储方法的流程示意图。如图1b 所示,该方法包括:
1a、接收待存储的数据记录。
1b、对上述数据记录包括的内容进行判断,如果包括时间戳、在时间戳所标识的时间点产生的数值以及可以标识数值的第一标识值组合,则执行步骤1c;如果不包括时间戳和在时间戳所标识的时间点产生的数值,但包括可以标识该数值的第一标识值组合,则执行步骤1d。
1c、根据用户的查询需求对上述数据记录进行预处理,获得满足查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征第二标识值组合的存储标识、目标时间和目标数值存储到第二信息表中,并将第一标识值组合存储到第一信息表中;其中,与时间无关的多维标识字段的一种取值构成第一标识值组合。
1d、将上述第一标识值组合存储到第一信息表中。
在此说明,本实施例可基于图1所示实施例实现,与图1所示实施例相同的地方不再赘述。
在本实施例中,考虑到多维标识字段的取值(即第一标识值组合)可能会发生变化的情况,例如对于表1中的“接口”、“方法”、“来源”和“结果”可能会因为业务规则的变化而变化,因此,接收到数据记录后,判断数据记录是否包括时间戳和在时间戳所标识的时间点产生的数值,如果数据记录不包括时间戳和在时间戳所标识的时间点产生的数值,则说明数据记录包括的是与时间无关的内容,即第一标识值组合,需要将该第一标识值组合存储到第一信息表中,由于不涉及与时间有关的内容,因此不需要对第二信息表进行操作。本实施例提供的方法可以适应用户的各种存储需求。
在一可选实施方式中,上述步骤102或步骤1c或步骤1d中将第一标识值组合存储到第一信息表中的方式包括:
向第一信息表所在的第一设备发送第一写请求,以使第一设备在确定第一信息表中不存在第一标识值组合时将第一标识值组合写入第一信息表中,第一写请求包括第一标识值组合。
具体的,向第一信息表所在的第一设备发送第一写请求,在第一写请求中携带上述第一标识值组合。第一设备接收到第一写请求后,从中获取第一标识值组合,在第一信息表中查询是否已经存在该第一标识值组合,如果该第一标识值组合已经存在,则忽略第一写请求中携带的第一标识值组合,如果该第一标识值组合不存在,则将该第一标识值组合写入第一信息表中。
在一可选实施方式中,上述步骤102或步骤1c中将表征第二标识值组合的存储标识、目标时间和目标数值存储到第二信息表中,包括:
向第二信息表所在的第二设备发送第二写入请求,以使第二设备确定表征第二标识值组合的存储标识并将存储标识、目标时间和目标数值对应写入第二信息表中,第二写请求包括第二标识值组合、目标时间和目标数值。
具体的,向第二信息表所在的第二设备发送第二写入请求,并在第二写请求中携带第二标识值组合、目标时间和目标数值。第二设备接收到第二写请求后,从第二写请求中获取第二标识值组合、目标时间和目标数值,之后将第二标识值组合唯一映射为一个存储标识,将该存储标识、目标时间和目标数值对应存储到第二信息表中。这里存储表征第二标识值组合的存储标识占用字节数远小于第二标识值组合,因此存储表征第二标识值组合的存储标识,有利于节约存储空间,并且有利于检索。
在此说明,上述第一信息表和第二信息表可以存储在同一设备上,也可以分别存储到不同的设备上。即上述第一设备和第二设备可以是同一设备,也可以是不同设备。较为优选的,由于两个信息表所存储的数据的结构差别较大,导致两个信息表的实现结构差异较大,因此,可以分别存储在不同的设备。
在一可选实施方式中,本实施例的第一信息表可以称为维度(dimension)表,相应的,第二信息表可以称为记录(record)表,但不限于此。
对表1所示应用场景,当采用本实施例提供的方法进行存储后会得到表2和表3,表2为第一信息表,表3为第二信息表。
表2
接口 方法 来源 结果
TradeFacade 创建(create) 淘宝(Taobao) Y
TradeFacade 支付(pay) Taobao Y
TradeFacade create 天猫(Tmall) Y
TradeFacade pay Tmall Y
TradeFacade pay Tmall N
……      
表3
存储标识 时间 金额(元)
ID1 2013-11-11 00:00 1459
ID2 2013-11-11 00:00 7398
ID3 2013-11-11 00:00 6999
ID4 2013-11-11 00:00 399
  ……  
其中,表2中存储的是“接口”、“方法”、“来源”和“结果”的取值组合,即第一标识值组合。表2中数据行的数量要远远少于表1中数据行的数量,因此为表2创建和维护次级索引工作量要小很多,会极大的降低对存储系统的影响,有利于提高存储效率。
表3中存储的是经过预处理后获得的满足用户的查询需求的目标数值(对应表3中的“金额”)、目标时间(对应表3中的“时间”)和存储标识。表3中第一行表示:2013-11-11 00:00这个时刻下,[接口=TradeFacade,方法=create]的总金额,则ID1所表征的第二标识值组合(即查询条件)为:[接口=TradeFacade,方法=create];第二行表示:2013-11-11 00:00这个时刻下,[接口=TradeFacade,方法=pay]的总金额,则ID2所表征的第二标识 值组合(即查询条件)为:[接口=TradeFacade,方法=pay];第三行表示:2013-11-11 00:00这个时刻下,[接口=TradeFacade,方法=pay,且结果=Y]的总金额,则ID3所表征的第二标识值组合(即查询条件)为:[接口=TradeFacade,方法=pay,且结果=Y];第四行表示:2013-11-11 00:00这个时刻下,[接口=TradeFacade,方法=pay,且结果=N]的总金额,则ID4所表征的第二标识值组合(即查询条件)为:[接口=TradeFacade,方法=pay,且结果=N]。表3中存储标识和金额两列构成键值(key-value)对,是不需要建立索引的,而时间列可以作为主键,也就是说表3不需要建立次级索引,进一步降低对存储系统的影响,有利于提高存储效率。
经过上述分析可知,本实施例提供的方法通过对待存储数据记录中的内容进行分类,将与时间无关的内容,例如多维标识字段的取值存储到第一信息表中,这样第一信息表中存储的是与时间无关的内容,数据量相对较少,创建和维护次级索引的工作量大大降低了;对于与时间有关的内容,根据用户的查询需求对数据记录进行预处理直接获得满足用户的查询需求的信息,即目标数值、目标时间和表征查询条件的存储标识,通过预处理一定程度上可以减少第二信息表中的数据量,另外通过存储目标数值、目标时间和存储标识使得第二信息表不需要建立次级索引。与现有技术相比,本实施例创建和维护次级索引的工作量极大的降低了,并且存储的数据量也有所降低,因此可以提高数据存储速度,提高存储系统的性能;相应的,在进行数据查询时,直接查询第二信息表时不再依赖次级索引,有利于提高查询速度,即使需要查询第一信息表,但由于第一信息表所维护的次级索引较少,所以与现有技术相比查询速度也有所提高。
另外,第二信息表存储的不再是完整维度的统计结果,更不是逐条存储原始交易记录,而是直接存储查询所需的结果。这样的好处显而易见,查询的时候直接对存储标识进行检索,不再需要任何次级索引,有利于提高查询速度。
下面结合本发明上述实施例提供的数据存储方法,对本发明提供的数据 查询方法的流程进行说明。
图2为本发明一实施例提供的数据查询方法的流程示意图。如图2所示,该方法包括:
201、接收查询请求。
202、在上述查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据该过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足该过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,该过滤条件包含部分标识字段的取值。
203、在上述查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征上述第二标识值组合的存储标识和上述目标时间在第二信息表中进行查询,获取对应于上述目标时间和存储标识的目标数值。
具体的,当用户有查询需求时可以向本实施例的执行主体,例如数据查询设备发送查询请求,该查询请求包含进行查询所需的信息。
数据查询设备接收查询请求,对查询请求包括的内容进行判断。当确定查询请求包括过滤条件但不包括目标时间时,说明该查询请求用以查询满足该过滤条件的多维标识字段的取值,于是直接根据该过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足该过滤条件的第一标识值组合。其中,满足过滤条件的第一标识值组合可以是一个或者多个。过滤条件包含部分标识字段的取值。
当确定查询请求包括作为查询条件的第二标识值组合和目标时间时,说明该查询请求用以查询对应于该目标时间和第二标识值组合的数值,于是直接根据表征该第二标识值组合的存储标识和目标时间在第二信息表中进行查询,获取与目标时间和表征第二标识值组合的存储标识对应的目标数值。第二信息表中存储有表征第二标识值组合的存储标识、目标时间和目标数值。
在一可选实施方式中,上述步骤202中根据过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足该过滤条件的第一标识值组合,包括:
向第一信息表所在的第一设备发送第一读请求,以使第一设备从第一信息表中读取满足该过滤条件的第一标识值组合。其中,第一读请求包括过滤条件。
具体的,数据查询设备向第一设备发送第一读请求,并在第一读请求中携带过滤条件。第一设备接收第一读请求,从中获取过滤条件,根据该过滤条件在第一信息表中进行查找,获取满足该过滤条件的第一标识值组合。
在一可选实施方式中,上述步骤203中根据表征第二标识值组合的存储标识和目标时间在第二信息表中进行查询,获取对应于目标时间和存储标识的目标数值,包括:
向第二信息表所在的第二设备发送第二读请求,以使第二设备确定表征第二标识值组合的存储标识,从第二信息表中读取与该存储标识和目标时间对应的目标数值;其中,第二读请求包括第二标识值组合和目标时间。
具体的,数据查询设备向第二设备发送第二读请求,并在第二读请求中携带第二标识值组合和目标时间。第二设备接收第二读请求,从中获取第二标识值组合和目标时间,确定表征第二标识值组合的存储标识,根据所确定的存储标识和目标时间在第二信息表中进行查找,获取与该存储标识和目标时间戳对应的目标数值。
在此说明,本实施例涉及到第一信息表和第二信息表的其他信息,以及其他一些名词(例如第一标识值组合、第二标识值组合等)解释或说明可参见图1a所示实施例中的描述。
由上述可见,在分类存储的基础上,当需要查询数值时,直接查询第二信息表,不需要任何次级索引,有利于提高查询速度,当查询标识值组合时,直接查询第一信息表,由于第一信息表的数据行较少,次级索引较少,与现有技术相比,同样可以提高查询速度。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同 时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
图3a为本发明一实施例提供的数据存储设备的结构示意图。如图3a所示,该设备包括:接收模块31和第一存储模块32。
接收模块31,用于接收待存储的数据记录。
第一存储模块32,与接收模块31连接,用于在接收模块31接收的数据记录包括时间戳、在该时间戳所标识的时间点产生的数值和可以标识该数值的第一标识值组合时,根据用户的查询需求对该数据记录进行预处理,获得满足查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征该第二标识值组合的存储标识、目标时间和目标数值存储到第二信息表中,并将第一标识值组合存储到第一信息表中。
其中,与时间无关的多维标识字段的一种取值构成所述第一标识值组合。
在一可选实施方式中,如图3b所示,该设备还包括:第二存储模块33。
第二存储模块33,与接收模块31连接,用于在接收模块31接收的数据记录不包括时间戳和在时间戳所标识的时间点产生的数值,但包括可以标识数值的第一标识值组合时,将第一标识值组合存储到第一信息表中。
在一可选实施方式中,第二存储模块33具体可用于在接收模块31接收的数据记录不包括时间戳和数值,但包括第一标识值组合时,向第一信息表所在的第一设备发送第一写请求,以使第一设备在确定第一信息表中不存在第一标识值组合时将第一标识值组合写入第一信息表中,该第一写请求包括第一标识值组合。
在一可选实施方式中,第一存储模块32具体可用于在接收模块31接收的数据记录包括时间戳、数值和第一标识值组合时,根据用户的查询需求对数据记录进行预处理,获得满足用户的查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,向第二信息表所在的第二设备发送第二写 入请求,以使第二设备确定表征第二标识值组合的存储标识并将存储标识、目标时间和目标数值对应写入第二信息表中,以及向第一信息表所在的第一设备发送第一写请求,以使第一设备在确定第一标识值组合不存在第一信息表中时将第一标识值组合存储到第一信息表中。这里的第一写请求包括上述第一标识值组合,第二写请求包括第二标识值组合、目标时间和目标数值。
本实施例提供的数据存储设备的各功能模块可用于执行图1a或图1b所示数据存储方法的流程,其具体工作原理不在赘述,详见方法实施例的描述。
本实施例提供的数据存储设备,通过对待存储数据记录中的数据进行分类,将与时间有关的数据,例如时间戳和数值存储到第二信息表中,这样第一信息表中存储的是与时间无关的内容,数据量相对较少,创建和维护次级索引的工作量大大降低了;对于与时间有关的内容,根据用户的查询需求对数据记录进行预处理直接获得满足用户的查询需求的信息,即目标数值、目标时间和表征查询条件的存储标识,通过预处理一定程度上可以减少第二信息表中的数据量,另外通过存储目标数值、目标时间和存储标识使得第二信息表不需要建立次级索引。与现有技术相比,本实施例的数据存储设备创建和维护次级索引的工作量极大的降低了,并且存储的数据量也有所降低,因此可以提高数据存储速度,提高存储系统的性能。
图4为本发明一实施例提供的数据查询设备的结构示意图。如图4所示,该数据查询设备包括:接收模块41、第一查询模块42和第二查询模块43。
接收模块41,用于接收查询请求。
第一查询模块42,与接收模块41连接,用于在接收模块41接收的查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,过滤条件包含部分标识字段的取值。
第二查询模块43,与接收模块41连接,用于在接收模块41接收的查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征第二标识值组 合的存储标识和目标时间在第二信息表中进行查询,获取对应于目标时间和存储标识的目标数值。
在一可选实施方式中,第一查询模块42具体可用于在接收模块41接收的查询请求包括作为查询条件的过滤条件但不包括目标时间时,向第一信息表所在的第一设备发送第一读请求,以使第一设备从第一信息表中读取满足该过滤条件的第一标识值组合。其中,第一读请求包括过滤条件。
在一可选实施方式中,第二查询模块43具体可用于在接收模块41接收的查询请求包括作为查询条件的第二标识值组合和目标时间时,向第二信息表所在的第二设备发送第二读请求,以使第二设备确定可以表征第二标识值组合的存储标识,从第二信息表中读取与表征第二标识值组合的存储标识和目标时间对应的目标数值;其中,第二读请求包括第二标识值组合和目标时间。
本实施例提供的数据查询设备的各功能模块可用于执行图2所示方法实施例的流程,其具体工作原理不再赘述,详见方法实施例的描述。
本实施例提供的数据查询设备,与上述实施例提供的数据存储设备相配合,在分类存储的基础上,当需要查询数值时,直接查询第二信息表,不需要任何次级索引,有利于提高查询速度,当查询标识值组合时,直接查询第一信息表,由于第一信息表的数据行较少,次级索引较少,与现有技术相比,同样可以提高查询速度。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种数据存储方法,其特征在于,包括:
    接收待存储的数据记录;
    在所述数据记录包括时间戳、在所述时间戳所标识的时间点产生的数值以及可以标识所述数值的第一标识值组合时,根据用户的查询需求对所述数据记录进行预处理,获得满足所述查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征所述第二标识值组合的存储标识、所述目标时间和所述目标数值对应存储到第二信息表中,并将所述第一标识值组合存储到第一信息表中;
    其中,与时间无关的多维标识字段的一种取值构成所述第一标识值组合。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在所述数据记录不包括时间戳和在所述时间戳所标识的时间点产生的数值,但包括可以标识所述数值的第一标识值组合时,将所述第一标识值组合存储到第一信息表中。
  3. 根据权利要求1或2所述的方法,其特征在于,所述将所述第一标识值组合存储到第一信息表中,包括:
    向所述第一信息表所在的第一设备发送第一写请求,以使所述第一设备在确定所述第一信息表中不存在所述第一标识值组合时将所述第一标识值组合写入所述第一信息表中,所述第一写请求包括所述第一标识值组合。
  4. 根据权利要求1或2所述的方法,其特征在于,所述将表征所述第二标识值组合的存储标识、所述目标时间和所述目标数值对应存储到第二信息表中,包括:
    向所述第二信息表所在的第二设备发送第二写入请求,以使所述第二设备确定表征所述第二标识值组合的存储标识并将所述存储标识、所述目标时间和所述目标数值对应写入所述第二信息表中,所述第二写请求包括所述第二标识值组合、所述目标时间和所述目标数值。
  5. 一种数据存储设备,其特征在于,包括:
    接收模块,用于接收待存储的数据记录;
    第一存储模块,用于在所述数据记录包括时间戳、在所述时间戳所标识的时间点产生的数值和可以标识所述数值的第一标识值组合时,根据用户的查询需求对所述数据记录进行预处理,获得满足所述查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,将表征所述第二标识值组合的存储标识、所述目标时间和所述目标数值存储到第二信息表中,并将所述第一标识值组合存储到第一信息表中;
    其中,与时间无关的多维标识字段的一种取值构成所述第一标识值组合。
  6. 根据权利要求5所述的设备,其特征在于,还包括:
    第二存储模块,用于在所述数据记录不包括时间戳和在所述时间戳所标识的时间点产生的数值,但包括可以标识所述数值的第一标识值组合时,将所述第一标识值组合存储到第一信息表中。
  7. 根据权利要求6所述的设备,其特征在于,所述第二存储模块具体用于在所述数据记录不包括所述时间戳和所述数值,但包括所述第一标识值组合时,向所述第一信息表所在的第一设备发送第一写请求,以使所述第一设备在确定所述第一信息表中不存在所述第一标识值组合时将所述第一标识值组合写入所述第一信息表中,所述第一写请求包括所述第一标识值组合。
  8. 根据权利要求5或6或7所述的设备,其特征在于,所述第一存储模块具体用于在所述数据记录包括所述时间戳、所述数值和所述第一标识值组合时,根据用户的查询需求对所述数据记录进行预处理,获得满足所述查询需求的目标数值和目标时间以及作为查询条件的第二标识值组合,向所述第二信息表所在的第二设备发送第二写入请求,以使所述第二设备确定表征所述第二标识值组合的存储标识并将所述存储标识、所述目标时间和所述目标数值对应写入所述第二信息表中,以及向所述第一信息表所在的第一设备发送第一写请求,以使所述第一设备在确定所述第一信息表中不存在所述第一标识值组合时将所述第一标识值组合写入所述第一信息表中,所述第一写请求包括所述第一标识值 组合,所述第二写请求包括所述第二标识值组合、所述目标时间和所述目标数值。
  9. 一种数据查询方法,其特征在于,包括:
    接收查询请求;
    在所述查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据所述过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足所述过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,所述过滤条件包括部分标识字段的取值;
    在所述查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征所述第二标识值组合的存储标识和所述目标时间在第二信息表中进行查询,获取对应于所述目标时间和所述存储标识的目标数值。
  10. 一种数据查询设备,其特征在于,包括:
    接收模块,用于接收查询请求;
    第一查询模块,用于在所述查询请求包括作为查询条件的过滤条件但不包括目标时间时,根据所述过滤条件在第一信息表存储的多个第一标识值组合中进行查询,获取满足所述过滤条件的第一标识值组合;其中,与时间无关的多维标识字段的一种取值构成一个第一标识值组合,所述过滤条件包含部分标识字段的取值;
    第二查询模块,用于在所述查询请求包括作为查询条件的第二标识值组合和目标时间时,根据表征所述第二标识值组合的存储标识和所述目标时间在第二信息表中进行查询,获取对应于所述目标时间和所述存储标识的目标数值。
PCT/CN2015/081651 2014-07-07 2015-06-17 数据存储方法、查询方法及设备 WO2016004813A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2017500353A JP6744854B2 (ja) 2014-07-07 2015-06-17 データ記憶方法、データ照会方法、およびそれらの装置
EP15819546.1A EP3168758A1 (en) 2014-07-07 2015-06-17 Data storage method, query method and device
US15/324,661 US10489372B2 (en) 2014-07-07 2015-06-17 Data storage methods, query methods, and apparatuses thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410320794.X 2014-07-07
CN201410320794.XA CN105446991B (zh) 2014-07-07 2014-07-07 数据存储方法、查询方法及设备

Publications (1)

Publication Number Publication Date
WO2016004813A1 true WO2016004813A1 (zh) 2016-01-14

Family

ID=55063565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081651 WO2016004813A1 (zh) 2014-07-07 2015-06-17 数据存储方法、查询方法及设备

Country Status (5)

Country Link
US (1) US10489372B2 (zh)
EP (1) EP3168758A1 (zh)
JP (1) JP6744854B2 (zh)
CN (1) CN105446991B (zh)
WO (1) WO2016004813A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153651A (zh) * 2016-03-03 2017-09-12 阿里巴巴集团控股有限公司 一种多维交叉数据处理方法及装置
TWI684929B (zh) * 2016-02-04 2020-02-11 香港商阿里巴巴集團服務有限公司 電子支付業務處理、電子支付方法及裝置
CN111258981A (zh) * 2020-01-13 2020-06-09 中国建设银行股份有限公司 一种数据处理方法、装置、设备和存储介质
CN111382197A (zh) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 分区管理、数据存储和查询方法及装置、设备、介质
CN111506600A (zh) * 2020-03-23 2020-08-07 杭州海康威视系统技术有限公司 分页查询方法、装置和电子设备
CN112115147A (zh) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 数据处理的方法、装置、设备和存储介质
CN112148512A (zh) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 一种内容库管理方法、装置、设备及存储介质
CN112732761A (zh) * 2021-01-13 2021-04-30 青岛海信网络科技股份有限公司 一种数据碰撞方法及装置
CN112749541A (zh) * 2021-01-14 2021-05-04 京东数字科技控股股份有限公司 数据校验系统、方法、装置、电子设备和计算机可读介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255877B (zh) * 2016-12-29 2020-11-24 北京国双科技有限公司 裁判文书的存储方法及装置
US10685026B2 (en) * 2017-04-11 2020-06-16 Sap Se Database query based match engine
CN108647280A (zh) * 2018-05-03 2018-10-12 北京云中融信网络科技有限公司 一种存储通讯信息的方法和装置
CN109284260B (zh) * 2018-10-16 2023-10-13 平安证券股份有限公司 大数据文件读取方法、装置、计算机设备及存储介质
US11474977B2 (en) 2019-09-30 2022-10-18 Dropbox, Inc. Snapshot isolation in a distributed storage system
CN111488386B (zh) * 2020-04-14 2023-09-29 北京易数科技有限公司 数据查询方法和装置
CN112100226B (zh) * 2020-09-18 2024-06-21 腾讯科技(深圳)有限公司 一种数据查询方法及计算机可读存储介质
CN112527828B (zh) * 2020-12-10 2023-03-14 福建新大陆支付技术有限公司 一种税控机税控记录存储方法及检索查询方法
CN112800179B (zh) * 2021-02-02 2022-02-15 浙江公共安全技术研究院有限公司 关联数据库查询方法、装置、存储介质及电子设备
CN113254447A (zh) * 2021-05-27 2021-08-13 平安普惠企业管理有限公司 Id生成方法、装置、电子设备及存储介质
CN113342832B (zh) * 2021-08-04 2021-11-02 北京快立方科技有限公司 一种数据库索引方法
CN113705184B (zh) * 2021-09-01 2023-09-22 同盾科技有限公司 自定义报表的生成方法及装置、存储介质、电子设备
CN113821514A (zh) * 2021-09-26 2021-12-21 维沃移动通信有限公司 数据拆分方法、装置、电子设备和可读存储介质
CN114064494A (zh) * 2021-11-19 2022-02-18 北京每日菜场科技有限公司 数据异常报警方法、装置、电子设备和计算机可读介质
CN115100757B (zh) * 2022-06-20 2023-05-09 重庆长安汽车股份有限公司 汽车数据的存储方法、装置、车辆及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533414A (zh) * 2009-04-15 2009-09-16 阿里巴巴集团控股有限公司 一种数据库记录唯一标识符生成的方法及装置
CN102999526A (zh) * 2011-09-16 2013-03-27 阿里巴巴集团控股有限公司 一种数据库关系表的拆分、查询方法及系统

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5732257A (en) * 1995-09-13 1998-03-24 Hewlett-Packard Co. Object conversion method from a flat object space to a class structured space
US5943665A (en) * 1997-09-09 1999-08-24 Netscape Communications Corporation Method and system for performing conceptual joins across fields of a database
US7739224B1 (en) * 1998-05-06 2010-06-15 Infor Global Solutions (Michigan), Inc. Method and system for creating a well-formed database using semantic definitions
US6189004B1 (en) * 1998-05-06 2001-02-13 E. Piphany, Inc. Method and apparatus for creating a datamart and for creating a query structure for the datamart
GB2363221B (en) * 2000-06-09 2002-05-01 Oracle Corp Summary creation
US7181450B2 (en) * 2002-12-18 2007-02-20 International Business Machines Corporation Method, system, and program for use of metadata to create multidimensional cubes in a relational database
US7895191B2 (en) * 2003-04-09 2011-02-22 International Business Machines Corporation Improving performance of database queries
US20050102326A1 (en) * 2003-10-22 2005-05-12 Nitzan Peleg Method and apparatus for performing conflict resolution in database logging
US7392242B1 (en) * 2004-02-27 2008-06-24 Hyperion Solutions Corporation Query costing in a multidimensional database
US7584178B2 (en) * 2006-04-20 2009-09-01 International Business Machines Corporation Query condition building using predefined query objects
US8190557B2 (en) * 2009-11-25 2012-05-29 Barber Paul Grant Processor and method configured for executing data transfer or data adjustment functions on OLAP based data
CN101968806A (zh) * 2010-10-22 2011-02-09 天津南大通用数据技术有限公司 数据存储方法、查询方法及装置
US20120197900A1 (en) * 2010-12-13 2012-08-02 Unisys Corporation Systems and methods for search time tree indexes
EP2490135A1 (en) * 2011-02-21 2012-08-22 Amadeus S.A.S. Method and system for providing statistical data from a data warehouse
CN102521303B (zh) * 2011-11-30 2016-08-10 北京人大金仓信息技术股份有限公司 一种用于列数据库的单表多列序存储方法
US8676772B2 (en) * 2011-12-09 2014-03-18 Telduráðgevin Sp/f Systems and methods for improving database performance
US8447730B1 (en) * 2012-01-31 2013-05-21 Yahoo! Inc. Probe system for replication monitoring
CN103577456B (zh) * 2012-07-31 2016-12-21 国际商业机器公司 用于处理时序数据的方法和装置
CN103902544B (zh) * 2012-12-25 2017-11-21 中国移动通信集团公司 一种数据处理方法及系统
GB2510626A (en) * 2013-02-11 2014-08-13 Face Recording And Measurement Systems Ltd Organising data entry forms
US9298521B1 (en) * 2013-04-29 2016-03-29 Seagate Technology Llc Command sets and functions
US11138243B2 (en) * 2014-03-06 2021-10-05 International Business Machines Corporation Indexing geographic data
US10360196B2 (en) * 2014-04-15 2019-07-23 Splunk Inc. Grouping and managing event streams generated from captured network data
US10346358B2 (en) * 2014-06-04 2019-07-09 Waterline Data Science, Inc. Systems and methods for management of data platforms

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533414A (zh) * 2009-04-15 2009-09-16 阿里巴巴集团控股有限公司 一种数据库记录唯一标识符生成的方法及装置
CN102999526A (zh) * 2011-09-16 2013-03-27 阿里巴巴集团控股有限公司 一种数据库关系表的拆分、查询方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3168758A4 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI684929B (zh) * 2016-02-04 2020-02-11 香港商阿里巴巴集團服務有限公司 電子支付業務處理、電子支付方法及裝置
US11282080B2 (en) 2016-02-04 2022-03-22 Advanced New Technologies Co., Ltd. Electronic payment service processing
CN107153651A (zh) * 2016-03-03 2017-09-12 阿里巴巴集团控股有限公司 一种多维交叉数据处理方法及装置
CN111382197A (zh) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 分区管理、数据存储和查询方法及装置、设备、介质
CN111382197B (zh) * 2018-12-28 2023-10-27 杭州海康威视数字技术股份有限公司 分区管理、数据存储和查询方法及装置、设备、介质
CN112148512A (zh) * 2019-06-27 2020-12-29 腾讯科技(深圳)有限公司 一种内容库管理方法、装置、设备及存储介质
CN111258981A (zh) * 2020-01-13 2020-06-09 中国建设银行股份有限公司 一种数据处理方法、装置、设备和存储介质
CN111506600A (zh) * 2020-03-23 2020-08-07 杭州海康威视系统技术有限公司 分页查询方法、装置和电子设备
CN111506600B (zh) * 2020-03-23 2023-06-16 杭州海康威视系统技术有限公司 分页查询方法、装置和电子设备
CN112115147A (zh) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 数据处理的方法、装置、设备和存储介质
CN112115147B (zh) * 2020-09-25 2024-04-30 北京百度网讯科技有限公司 数据处理的方法、装置、设备和存储介质
CN112732761A (zh) * 2021-01-13 2021-04-30 青岛海信网络科技股份有限公司 一种数据碰撞方法及装置
CN112732761B (zh) * 2021-01-13 2022-08-23 青岛海信网络科技股份有限公司 一种数据碰撞方法及装置
CN112749541A (zh) * 2021-01-14 2021-05-04 京东数字科技控股股份有限公司 数据校验系统、方法、装置、电子设备和计算机可读介质

Also Published As

Publication number Publication date
CN105446991A (zh) 2016-03-30
EP3168758A4 (en) 2017-05-17
US20180181606A1 (en) 2018-06-28
JP6744854B2 (ja) 2020-08-19
CN105446991B (zh) 2018-10-30
JP2017523513A (ja) 2017-08-17
EP3168758A1 (en) 2017-05-17
US10489372B2 (en) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2016004813A1 (zh) 数据存储方法、查询方法及设备
US11681733B2 (en) Massive scale heterogeneous data ingestion and user resolution
CA2845743C (en) Resolving similar entities from a transaction database
US9053160B2 (en) Distributed, real-time online analytical processing (OLAP)
US9361320B1 (en) Modeling big data
US10169730B2 (en) System and method to present a summarized task view in a case management system
US9940360B2 (en) Streaming optimized data processing
US20140101201A1 (en) Distributed data warehouse
US20240126817A1 (en) Graph data query
CN111506559A (zh) 数据存储方法、装置、电子设备及存储介质
US20180121292A1 (en) Systems and methods for database management
US20150199645A1 (en) Customer Profile View of Consolidated Customer Attributes
US9098550B2 (en) Systems and methods for performing data analysis for model proposals
CN112241420A (zh) 一种基于关联规则算法的政务服务事项推荐方法
US11188981B1 (en) Identifying matching transfer transactions
CN114064660B (zh) 基于ElasticSearch的数据结构化分析方法
CN107729330B (zh) 获取数据集的方法和装置
CN110908983A (zh) 一种基于用户画像识别的智能营销系统
CN116483822B (zh) 业务数据预警方法、装置、计算机设备、存储介质
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
CN108304499B (zh) 一种sql连接操作中谓词下推的方法、终端及介质
CN111984798A (zh) 图谱数据预处理方法及装置
CN113127491B (zh) 一种基于关联特征的流图划分系统
US20240220876A1 (en) Artificial intelligence (ai) based data product provisioning
CN115733787A (zh) 一种网络识别方法、装置、服务器及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15819546

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017500353

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 15324661

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015819546

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015819546

Country of ref document: EP