WO2023029592A1 - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
WO2023029592A1
WO2023029592A1 PCT/CN2022/093272 CN2022093272W WO2023029592A1 WO 2023029592 A1 WO2023029592 A1 WO 2023029592A1 CN 2022093272 W CN2022093272 W CN 2022093272W WO 2023029592 A1 WO2023029592 A1 WO 2023029592A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
time
input parameters
identifier
cache
Prior art date
Application number
PCT/CN2022/093272
Other languages
French (fr)
Chinese (zh)
Inventor
屠志强
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023029592A1 publication Critical patent/WO2023029592A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Definitions

  • the present disclosure relates to the technical field of big data, and in particular to a data processing method and device.
  • the data service interface needs to aggregate data through OLAP (Online Analytical Processing, Online Analytical Processing).
  • OLAP Online Analytical Processing, Online Analytical Processing
  • the embodiments of the present disclosure provide a data processing method and device.
  • each query identifier corresponding to the query input parameters is obtained.
  • the query identifier is obtained from the cache
  • the target data is obtained, which improves the query efficiency, improves the interface response speed, and thus improves the user experience.
  • a data processing method including:
  • the query results of the respective query identifiers are spliced to obtain the target data.
  • parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
  • each query identifier corresponding to the query input parameter is determined.
  • parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
  • the dimension types include grouping dimensions
  • the time attribute and the grouping dimension determine the number of return values to be acquired corresponding to the query input parameter
  • each query identifier corresponding to the query input parameter is determined.
  • the dimension type further includes a sorting dimension, and the sorting dimension indicates a sorting field and a sorting value range;
  • Splicing the query results of the various query identifiers to obtain the target data includes:
  • determining the number of return values to be acquired corresponding to the query input parameters includes:
  • parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
  • the dimension type includes a restricted dimension, and the restricted dimension indicates an enumerated value
  • each query identifier corresponding to the query input parameter is determined.
  • determining each query identifier corresponding to the query input parameter includes:
  • parsing the query input parameters to determine each query identifier corresponding to the query input parameters further includes:
  • the time attribute includes start time, end time and time granularity
  • the end time of the new time query identifier is the sum of the end time of the query input parameters and the time granularity.
  • the query result and the query identifier corresponding to the query result are disassembled to obtain the subquery identifier and the return value corresponding to the subquery identifier, and The sub-query identifier and the return value corresponding to the sub-query identifier are correspondingly stored in the cache.
  • a data processing device including:
  • the receiving module receives an acquisition request for target data, and the acquisition request includes query input parameters;
  • the determining module is configured to analyze the query input parameters, and determine each query identifier corresponding to the query input parameters;
  • a judging module judging whether the query identifier exists in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
  • the splicing module splices the query results of the respective query identifiers to obtain the target data.
  • an electronic device including:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method provided in the present disclosure.
  • a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the data processing method provided in the present disclosure is implemented.
  • An embodiment in the above disclosure has the following advantages or beneficial effects: by analyzing the query input parameters in the acquisition request of the target data, each query identifier corresponding to the query input parameters is obtained, and firstly, the query result corresponding to the query identifier is obtained from the cache , if the corresponding query result does not exist in the cache, the corresponding query result is obtained from the database, and then the target data is obtained, which improves the query efficiency, improves the interface response speed, and thus improves the user experience.
  • FIG. 1 is a schematic diagram of the main flow of a data processing method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of the main flow of another data processing method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the main flow of another data processing method according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a data processing method according to an embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram of the main modules of a device for data processing according to an embodiment of the present disclosure
  • FIG. 6 is an exemplary system architecture diagram to which embodiments of the present disclosure can be applied.
  • Fig. 7 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.
  • the existing data service interface is generally equipped with a cache based on the Key-Value storage system.
  • the data service interface When the caller queries data through the data service interface, the data service interface generates a Key according to the interface query input parameters or generated sql, and combines the Key and the returned Value is stored in the cache so that the cache can be queried next time.
  • this method will result in a large granularity of data stored in the cache, and a low cache hit rate.
  • the granularity of the data in the cache is determined by the interface caller. Different callers cannot share the cache if the data range and data granularity are different. , resulting in low cache hit rate, low data query efficiency, long interface response time, and reduced user experience.
  • the embodiments of the present disclosure provide a data processing method, which can improve cache hit rate, improve data query efficiency, shorten interface response time, and improve interface performance.
  • Fig. 1 is a schematic diagram of the main flow of a method for data processing according to an embodiment of the present disclosure. As shown in Fig. 1, the method for data processing is applied to the server and includes the following steps:
  • Step S101 receiving an acquisition request for target data, where the acquisition request includes query input parameters
  • Step S102 Analyzing the query input parameters, and determining each query identifier corresponding to the query input parameters;
  • Step S103 Determine whether there is a query identifier in the cache, if yes, execute step S104, if not, execute step S105;
  • Step S104 Obtain the query result corresponding to the query identifier from the cache
  • Step S105 Obtain the query result corresponding to the query identifier from the database
  • Step S106 Concatenate the query results of each query identifier to obtain the target data.
  • the caller sends a target data acquisition request to the service provider through the data service interface, so as to acquire the target data.
  • the caller is the party that calls the data service interface
  • the service provider is the called party that provides the target data.
  • the service provider can be the server, and the caller can be the client. Obtain the query input parameters included in the request, and the service provider can determine the target data according to the query input parameters and return it to the caller.
  • the identifier of the requesting party, the identifier of the called interface, and the identifier or index of the target data can be parsed out from the query input parameters, and the channel identifier can also be parsed out.
  • the identifier of the called interface indicates the called interface
  • the caller’s identifier is used to verify whether the caller has the permission to obtain the target data. It can be verified by judging whether the caller’s identifier is in the white list. If the verification passes , you can respond to the fetch request, query and return the target data, and refuse to respond to the fetch request if the verification fails.
  • the identifier or indicator of the target data is the identifier of the target data requested by the acquisition request, such as the identifier of the accumulated order quantity.
  • the channel ID is the ID of the channel through which the caller invokes the data service interface. If it is a market channel, the market channel ID can be parsed out.
  • the business content may also be parsed out, that is, the business scope of the acquired target data.
  • step S102 further includes: parsing the time attribute from the query input parameter; determining the number of return values to be obtained corresponding to the query input parameter according to the time attribute; The number of returned values obtained determines the respective query identifiers corresponding to the query input parameters.
  • the time attribute can be parsed from the query input parameter, and the time attribute includes start time, end time and time granularity.
  • the start time and end time determine the time range for querying the target data, and the time granularity can be a time interval, such as 1 day, 1 hour, 10 minutes, 1 minute, and so on.
  • the number of return values to be acquired can be determined. For example, caller A requests to obtain the order quantity per hour on 2021-05-01, and parses it from the query input parameters.
  • the start time is 2021-05-01 00:00:00 and the end time is 2021-05-01 23:59: 59. If the time granularity is 1 hour, then the number of return values to be acquired is 12.
  • the caller when the caller uses a business-related dimension to obtain target data, it can parse out the dimension type from the query input parameter, and the dimension type includes at least one of grouping dimension, restriction dimension and sorting dimension.
  • the grouping dimension indicates the grouping field, which is used to indicate the dimension of grouping.
  • the grouping dimension can be platform dimension, department dimension, category dimension and other dimensions that can realize grouping.
  • the platform dimension indicates the target data obtained by platform grouping
  • the department dimension indicates The target data is obtained by grouping by department
  • the category dimension indicates that the target data is obtained by grouping by category. Further analysis can obtain the enumeration value corresponding to the grouping dimension.
  • the enumeration value of the platform dimension includes all platforms; the enumeration value of the second-level department dimension is all the second-level departments; the enumeration value of the category dimension is all classes head.
  • the restricted dimension restricts the enumeration value.
  • the restricted dimension can be to obtain the target data of a certain platform, department, province, etc.
  • the sorting dimension indicates the order in which the target data is sorted according to a certain indicator, and can indicate the range of the target data obtained after sorting.
  • the sorting dimension can indicate the top ten data obtained by order volume.
  • the grouping dimension, the restricting dimension and the sorting dimension can be combined arbitrarily, so that the scope of data query is wider, so as to improve interface performance and user experience.
  • step S102 further includes:
  • Step S201 Parse the time attribute and dimension type from the query input parameter; the dimension type includes grouping dimension;
  • Step S202 According to the time attribute and the grouping dimension, determine the number of return values to be acquired corresponding to the query input parameters;
  • Step S203 According to the number of return values to be obtained, determine each query identifier corresponding to the query input parameters.
  • step S202 further includes: judging whether the grouping dimension matches the configured grouping dimension; if so, determining the number of return values to be obtained according to the time attribute and the enumeration value of the grouping dimension number; if not, determine the number of return values to be obtained according to the time attribute.
  • the query identifier may be a keyword, a Key in a Key-Value (K-V for short) storage system, or other query parameters that can obtain query results from a cache or a database. Further optionally, the query identifier is Key, and one or more Keys corresponding to the query input parameters can be determined according to the query input parameters, and the query result is Value.
  • K-V Key-Value
  • the time attribute includes start time, end time, and time granularity.
  • the dimension type is a grouping dimension
  • the grouping dimension indicates the grouping field. Analyze the grouping dimension according to the grouping field and judge Whether the grouping dimension matches the configured grouping dimension, if yes, that is, the grouping dimension in the query input parameter is the configured grouping dimension, then get the enumeration value of the grouping dimension, and according to the time attribute and the enumeration value of the grouping dimension , to determine the number of return values to be obtained for the query input parameter. If not, that is, the grouping dimension is not configured, then determine the number of return values to be obtained according to the time attribute.
  • the grouping dimension is parsed as a second-level department field from the query input parameters. If the grouping dimension of the second-level department has been configured, the department dimension can be matched, and then the enumerated value of the configured second-level department dimension can be obtained ( Including department b1, department b2 and department b3), the number of return values to be obtained can be determined according to the time attribute and enumeration value. If the grouping dimension of the secondary department is not configured, the number of return values to be obtained can be determined according to the time attribute number. The return value to be obtained is the number of query input parameters or expected return values.
  • the second-level department has a and b two, query the order volume of secondary department a on 2021-05-01 and 2021-05-02 respectively, and the order volume of secondary department b on 2021-05-01 and 2021-05-02 respectively , the number of return values to be obtained is 4, regardless of whether the value of the order quantity is empty or not.
  • step S102 further includes:
  • the dimension type includes the restricted dimension, and the restricted dimension indicates the enumerated value
  • each query identifier corresponding to the query input parameter is determined.
  • each query identifier corresponding to the query input parameter is determined, including:
  • Step S301 Determine whether the number of return values to be acquired is greater than 1, if not, execute step S302, if yes, execute step S303;
  • Step S302 Determine a query identifier corresponding to the query input parameter
  • Step S303 Determine whether the number of return values to be acquired in the cache is not less than the preset threshold, if yes, execute step S304, if not, execute step S305;
  • Step S304 disassemble the query input parameters, and determine each query identifier corresponding to the query input parameters
  • Step S305 Determine a query identifier corresponding to the query input parameter without disassembling the query input parameter.
  • the disassembly is to disassemble the query input parameters into multiple query identifiers, and obtain the query results corresponding to the query identifiers from the cache and/or database according to the query identifiers, because the time to query data from the cache is It is much shorter than the time to query data from the database, so by dismantling it into multiple query identifiers, querying, first querying from the cache and then querying from the database, the query efficiency can be improved.
  • the cache may be a K-V cache, such as a Redis cache, and the database may be a Mysql database, or other caches or databases.
  • the preset threshold is determined by the number of return values to be obtained corresponding to the query input parameters and the system threshold
  • the coefficient threshold is a value between 0 and 1 estimated based on experience and log analysis (such as 0.6). Assume that the number of return values to be obtained corresponding to the query input parameter is N, the number of return values to be obtained in the cache is M, and the coefficient threshold is r.
  • disassembly can be performed; otherwise, No disassembly is performed to ensure a short data query time. Because the time to query and obtain data from the cache is much shorter than the time to query and obtain data from the database. If the time required to query multiple values from the database is ts, after disassembly, some values hit the cache, and the query cache time is ts1 (much less than ts), and the remaining values are queried from the database once, and the query time is ts2, which will be Multiple values obtained by the query are spliced, and the splicing time is ts3. If ts2+ts3 ⁇ ts, then it is worth dismantling. Otherwise, it is not worth dismantling. That is, the more values are checked from the cache, the shorter the query time and the better the performance of the data service interface.
  • the cumulative value of 0-24 points, 24 values are returned, and these values are sorted in chronological order and form a continuous time trend, which can be disassembled according to the time granularity, so that the The acquisition request is disassembled into 24 query requests to query separately, and 24 query identifiers can be determined to query the cumulative value of 0-1 point, the cumulative value of 0-2 point, the cumulative value of 0-3 point... 0 -24 point cumulative value. If x of the 24 query identifiers are in the cache, the query results of the x query identifiers are queried from the cache, and the query results of the remaining 24-x query identifiers are obtained by querying the database.
  • disassembling the query input parameters includes: disassembling the query input parameters according to time attributes and dimension types.
  • the acquisition request is disassembled into three query requests to be queried separately, and three query identifiers can be determined, which are the accumulated data of the second-level department b1 and the accumulated data of the second-level department b2 and cumulative data for secondary sector b3.
  • determining each query identifier corresponding to the query input parameter includes: determining the form of the corresponding query identifier. After the query identifier is determined, the form of the query identifier is determined according to the query input parameters, so that the form of the query identifier conforms to the form of the cache and/or database, so that the query result can be quickly obtained.
  • each query ID corresponding to the query input parameter After determining each query ID corresponding to the query input parameter, first determine whether there is a query ID in the cache, if it exists, obtain the query result of the query ID from the cache, if not, obtain the query corresponding to the query ID from the database As a result, it can be determined which query IDs can be queried from the cache, which query IDs can be queried from the database, and the query IDs that can be queried from the cache can be determined first, and the remaining query IDs can be queried from the database again, which can greatly reduce Data query time improves query efficiency, shortens interface response time, improves interface throughput, improves interface anti-concurrency capability, and improves interface performance. For example, if it is determined that among the 24 keys, x keys are in the K-V cache, the values corresponding to the x keys are obtained from the K-V cache, and then the values corresponding to the remaining 24-x keys are obtained from the database.
  • the request will be split into three requests, which are to obtain orders from 2021.5.1, to obtain orders from 2021.5.2 and Get orders for 2020.5.3.
  • three keys are stored in the cache, namely order_d_2021-05-01_2021-05-01, order_d_2021-05-02_2021-05-02 and order_d_2021-05-03_2021-05 -03.
  • the caller requests to obtain the order data from 2021.5.1 to 2020.5.4, first check order_d_2021-05-01_2021-05-01, order_d_2021-05-02_2021-05-02, order_d_2021-05-03_2021-05- in the cache 03, order_d_2021-05-04_2021-05-04, where 3 keys are hit, you only need to check order_d_2021-05-04_2021-05-04 from the database.
  • the first-level department a is composed of the second-level department b1, the second-level department b2, and the second-level b3, then a can be disassembled into b1, b2, and b3.
  • the caller requests the order data of all the second-level departments under the first-level department a on 2021.5.12, it will be disassembled to query the order data of the second-level departments b1, b2, and b3 on 2021.5.12, that is, query order_d_20210512_b1, order_d_20210512_b2, order_d_20210512_b3, first query the cache, where order_d_20210512_b1 can hit the cache, order_d_20210512_b2, order_d_20210512_b3 cannot be found in the cache, and then query the database.
  • the embodiment of the present disclosure improves the cache hit ratio and reduces the data volume of the query data volume.
  • the query results corresponding to each query identifier are obtained, the query results are spliced, that is, the query results obtained from the cache and the query results obtained from the database are spliced, converted into a unified form, the target data is obtained, and the target Data is returned to the caller.
  • target data is obtained, including:
  • the query results of the respective query identifiers are sorted, and according to the sorting value range, the query results corresponding to the sorting value range are obtained, so as to obtain the target data.
  • the range of sorting values is the selection range after sorting is performed according to the sorting field.
  • starting time 2021-05-01 00:00:00
  • ending time 2021-05-02 23:59:59 from the query input parameters
  • the time granularity is 1 day
  • the grouping field is the second-level department
  • sorting The field is the order quantity
  • the sorting value range is the first two results with the largest order quantity.
  • the query results of three of the keys are obtained from the cache, and the query results of the remaining one key are obtained from the database, and then the query results obtained from the cache and the database are converted into a unified form , and sort according to the size of the order volume, and select the top 2 query results with the largest order volume, get the target data and return it.
  • the data at the next moment under this time granularity may also be queried.
  • the scenario For example: Comparing real-time data with last year’s data, if the real-time data is based on 1-minute time granularity, the order volume in a day can be 1440 points of data, and the same period last year is also based on 1-minute time granularity, last year’s order data.
  • step S102 further includes:
  • start time asynchronously generate a new time query identifier corresponding to the query input parameter
  • the end time of the new time query identifier is the sum of the end time of the query input parameter and the time granularity.
  • a new time query ID can be generated asynchronously.
  • the new time query ID is the query ID at the next moment in each query ID determined according to the query input parameters, as a prediction query ID.
  • the end time of the new time stamp can be calculated by summing the end time of the query input parameters and the time granularity, for example, query platform a, department b, between 2020-01-01 00:00:00 ⁇ 2020-01-01 00
  • the end time of the new time query ID is 2020-01-01 01:59:59, that is, the query for the new time query ID is 2020-01-01 Cumulative indicator v_1 between 00:00:00 and 2020-01-01 01:59:59.
  • the query result corresponding to the new time query identifier is obtained from the database according to the new time query identifier, and then the new time query identifier and the query result corresponding to the new time query identifier are Correspondingly stored in the cache to update the cache, so that when the query result of the new time query identifier is obtained at the next moment or later, it can be obtained directly from the cache instead of from the database, thereby improving the cache hit rate. Improve the efficiency of data query and shorten the interface response time.
  • the new time query ID can be asynchronously sent to a message component, such as a message queue, to query the database by listening to messages and staggering the time, and store the new time query ID and corresponding query The result is in the cache, and the cache has been updated.
  • a message component such as a message queue
  • type 1 query platform a, department b, the cumulative index v_1 between 2020-01-01 00:00:00 ⁇ 2020-01-01 00:59:59, it is likely to need to query at the next moment Platform a, department b, cumulative index v_1 between 2020-01-01 00:00:00 and 2020-01-01 01:59:59, so that a new time Key can be determined.
  • Type 2 query platform a department b, between 2020-01-01 00:00:00 ⁇ 2020-01-01 11:59:59, cumulative value of 0-1 point, cumulative value of 0-2 point , the accumulative value of 0-3 o'clock...the accumulative value of 0-12 o'clock, it is very likely that the accumulative value of 0-13 o'clock will need to be checked at the next moment, so that a new time Key can be determined.
  • Type 3 query the cumulative data between 2020-01-01 00:00:00 ⁇ 2020-01-01 00:59:59 of platform a and all the second-level departments under the first-level department b, it is likely to download It is necessary to check the cumulative data between 2020-01-01 00:00:00 and 2020-01-01 01:59:59 of platform a and all the second-level departments under the first-level department b, such as the second-level There are 3 departments, the number of return values to be obtained is 3, and 3 Keys can be determined, and each Key can generate a new time Key, so a total of 3 new time Keys will be generated, and the new time Key and the data from the database The query result of the new time Key obtained in the corresponding storage in the cache, so that when the caller calls next time, the cache can be hit without querying from the database, which improves the cache hit rate.
  • the comparison history will also query the cumulative order data between the 0 time point and the 100th point data on 2020.5.12 Order data, at this time, in addition to querying the cumulative order data between the 0 moment of 2020.5.12 - the 100th point, the cumulative order data from the 0 moment of 2020.5.12 - the 101st point will be generated asynchronously and stored in the cache. In this way, at the next moment, if the caller calls, the cache can be hit, which improves the cache hit rate.
  • the query identifier When among the determined query identifiers, there is a query identifier that needs to obtain query results from the database, after obtaining the query results from the database, determine whether the number of returned values in the query results is greater than 1, and if so, indicate the query If the value is a value, the query identifier and the corresponding query result are directly stored in the cache without disassembly; if the number of returned values in the query result is greater than 1, the query The result and the corresponding query identifier are disassembled to obtain the subquery identifier and the return value corresponding to the subquery identifier.
  • the number of subquery identifiers is equal to the number of return values in the query result, and then store the subquery identifier and the return value corresponding to the subquery identifier To the cache to update the cache, so that the granularity of the data stored in the cache is finer, that is, the coarse-grained query identifier is disassembled into multiple fine-grained sub-query identifiers, thereby improving the cache hit rate, and the higher the cache hit rate , the better the interface performance. Fine-grained data is stored in the cache, and when different callers or query different data ranges or combinations of dimensions, the cache can also be shared.
  • FIG. 4 it is a schematic flow diagram of a data processing method according to an embodiment of the present disclosure.
  • the method includes: receiving an acquisition request for target data, where the acquisition request includes query input parameters; analyzing the query input parameters to obtain time attributes and grouping dimension, generate a new time Key asynchronously according to the time attribute, and store it in the message queue, obtain the new time Key from the message queue, query the Value corresponding to the new time Key from the database, and map the new time Key to the corresponding Value Store in the cache; determine whether the grouping dimension matches the configured grouping dimension, if so, determine the number of return values to be obtained according to the time attribute and the enumeration value of the grouping dimension, if not, determine the number of return values to be obtained according to the time attribute The number of return values; determine whether the number of return values to be obtained is greater than 1, if not, splice a Key; if so, determine whether the number of return values to be obtained in the cache is not less than the preset threshold; if yes,
  • a Key is spliced without disassembly; to determine whether the Key is in the cache, if it is in the cache, then Obtain the corresponding Value from the cache. If it is not in the cache, obtain the corresponding Value from the database. Combine the Value obtained from the cache with the Value obtained from the database to obtain the target data and return it; at the same time, it is judged to obtain it from the database. Whether the number of return values in the Value is greater than 1, if not, store the Value and the corresponding Key in the cache, if yes, according to the number of return values in the Value, disassemble the Value and Key and store them correspondingly into the cache.
  • each query identifier corresponding to the query input parameter is determined by analyzing the query input parameters in the acquisition request, and by determining the storage location of each query identifier, successively through the query cache and the database Obtain the query results corresponding to each query identifier in the same way.
  • the query input parameters can be disassembled, multiple query IDs are determined according to the query input parameters.
  • the query results obtained from the database and the corresponding query IDs are disassembled and stored in the cache, so that The granularity of the data stored in the cache is small, which is convenient for subsequent data query, and the cache hit rate is high, and the query ID of the new time is generated according to the time attribute, and the query ID of the new time is compared with the corresponding query ID obtained from the database.
  • the query results are stored in the cache to improve the cache hit rate of subsequent data queries.
  • the data processing method provided by the embodiments of the present disclosure can improve cache hit rate, improve data query efficiency, shorten interface response time, improve interface throughput, improve interface anti-concurrency capability, improve interface service performance, and further improve user experience.
  • an embodiment of the present disclosure also provides a data processing device 500, including:
  • the receiving module 501 receives an acquisition request for target data, where the acquisition request includes query parameters;
  • Determining module 502 analyzing the query input parameters, and determining each query identifier corresponding to the query input parameters;
  • Judging module 503 judging whether there is a query identifier in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
  • the splicing module 504 splices the query results of each query identifier to obtain the target data.
  • the determination module 502 is further configured to: parse out the time attribute from the query input parameter; The number of return values; according to the number of return values to be obtained, each query identifier corresponding to the query input parameters is determined.
  • the determination module 502 is further configured to: parse out the time attribute and dimension type from the query input parameter; the dimension type includes a grouping dimension;
  • time attribute and grouping dimension determine the number of return values to be obtained corresponding to the query input parameters
  • each query identifier corresponding to the query input parameter is determined.
  • the determining module 502 is further configured to: parse out the time attribute and dimension type from the query input parameter; the dimension type includes a restricted dimension, and the restricted dimension indicates an enumerated value;
  • each query identifier corresponding to the query input parameter is determined.
  • the determining module 502 is further configured to: judge whether the grouping dimension matches the configured grouping dimension; if so, determine the pending The number of return values obtained; if not, determine the number of return values to be obtained according to the time attribute.
  • the determination module 502 is further used to: determine whether the number of return values to be obtained is greater than 1; if not, determine a query identifier corresponding to the query input parameter; if so, determine the Whether the number of return values to be obtained is not less than the preset threshold, if yes, disassemble the query input parameters, and determine the query identifiers corresponding to the query input parameters, if not, do not disassemble, and determine the corresponding query input parameters A query ID for .
  • the determination module 503 is further configured to: parse out the time attribute from the query input parameter, the time attribute includes start time, end time and time granularity, according to the start time, end time and time granularity , asynchronously generate a new time query identifier corresponding to the query input parameter; wherein, the end time of the new time query identifier is the sum of the end time of the query input parameter and the time granularity.
  • the dimension type also includes a sorting dimension, and the sorting dimension indicates a sorting field and a sorting value range; the splicing module 504 is also configured to: sort the query results of each query identifier according to the sorting field, and According to the sorting value range, the query result corresponding to the sorting value range is obtained to obtain the target data.
  • the device further includes a storage module, configured to, after obtaining the query result corresponding to the query identifier from the database, further include: judging whether the number of returned values in the query result is greater than 1; if No, correspondingly store the query result and the query identifier corresponding to the query result in the cache; if yes, store the query result and the query identifier corresponding to the query result according to the number of returned values in the query result Perform disassembly to obtain the sub-query identifier and the return value corresponding to the sub-query identifier, and correspondingly store the sub-query identifier and the return value corresponding to the sub-query identifier in the cache.
  • Embodiments of the present disclosure also provide an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that One or more processors implement the data processing method provided by the embodiments of the present disclosure.
  • Embodiments of the present disclosure also provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the data processing method provided by the embodiments of the present disclosure is implemented.
  • FIG. 6 shows an exemplary system architecture 600 to which the data processing method or data processing device of the embodiments of the present disclosure can be applied.
  • a system architecture 600 may include terminal devices 601 , 602 , and 603 , a network 604 and a server 605 .
  • the network 604 is used as a medium for providing communication links between the terminal devices 601 , 602 , 603 and the server 605 .
  • Network 604 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • terminal devices 601 , 602 , 603 Users can use terminal devices 601 , 602 , 603 to interact with server 605 via network 604 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 601, 602, 603, such as search applications, shopping applications, web browser applications, instant messaging tools, email clients, social platform software, etc. (just for example).
  • the terminal devices 601, 602, 603 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.
  • the server 605 may be a server that provides various services, such as a background management server that provides support for search applications used by users using the terminal devices 601 , 602 , 603 (just an example).
  • the background management server can analyze and process the received data such as the target data acquisition request, and feed back the processing result (target data—just an example) to the terminal device.
  • the data processing method provided by the embodiments of the present disclosure is generally executed by the server 605 , and correspondingly, the data processing device is generally disposed in the server 605 .
  • terminal devices, networks and servers in FIG. 6 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 7 shows a schematic structural diagram of a computer system 700 suitable for implementing a terminal device according to an embodiment of the present disclosure.
  • the terminal device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • a computer system 700 includes a central processing unit (CPU) 701 that can operate according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random-access memory (RAM) 703 Instead, various appropriate actions and processes are performed.
  • ROM read-only memory
  • RAM random-access memory
  • various programs and data required for the operation of the system 700 are also stored.
  • the CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704 .
  • the following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 708 including a hard disk, etc. and a communication section 709 including a network interface card such as a LAN card, a modem, or the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • a drive 710 is also connected to the I/O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication portion 709 and/or installed from removable media 711 .
  • this computer program is executed by a central processing unit (CPU) 701, the above-described functions defined in the system of the present disclosure are performed.
  • CPU central processing unit
  • the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware.
  • the described modules may also be set in a processor, for example, it may be described as: a processor includes a receiving module, a determining module, a judging module and a splicing module.
  • a processor includes a receiving module, a determining module, a judging module and a splicing module.
  • the names of these modules do not constitute a limitation on the module itself under certain circumstances, for example, the receiving module can also be described as "a module that receives an acquisition request for target data".
  • the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments, or may exist independently without being assembled into the device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device includes: receiving an acquisition request for target data, where the acquisition request includes query input parameters; Analyze the parameters to determine each query identifier corresponding to the query input parameter; determine whether there is a query identifier in the cache, if so, obtain the query result corresponding to the query identifier from the cache; if not, obtain the query result corresponding to the query identifier from the database; The query results of each query identifier are spliced to obtain the target data.
  • each query identifier corresponding to the query input parameter is determined, and by judging the storage location of each query identifier, successively through the query cache and the database Obtain the query result corresponding to each query identifier.
  • the query input parameters can be disassembled, multiple query IDs are determined according to the query input parameters.
  • the query results obtained from the database and the corresponding query IDs are disassembled and stored in the cache, so that The granularity of the data stored in the cache is small, which is convenient for subsequent data query, and the cache hit rate is high, and the query ID of the new time is generated according to the time attribute, and the query ID of the new time is compared with the corresponding query ID obtained from the database.
  • the query results are stored in the cache to improve the cache hit rate of subsequent data queries.
  • the data processing method provided by the embodiments of the present disclosure can improve cache hit rate, improve data query efficiency, shorten interface response time, improve interface throughput, improve interface anti-concurrency capability, improve interface service performance, and further improve user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a data processing method and apparatus, which relate to the technical field of big data. The method comprises: receiving an acquisition request for target data, wherein the acquisition request comprises a query input parameter (S101); parsing the query input parameter, so as to determine query identifiers corresponding to the query input parameter (S102); determining whether the query identifiers are present in a cache (S103); if so, acquiring, from the cache, query results corresponding to the query identifiers (S104); if not, acquiring, from a database, query results corresponding to the query identifiers (S105); and combining the query results of the query identifiers, so as to obtain the target data (S106). By means of the embodiments, the cache hit rate can be improved, the efficiency of data querying is improved, and the response time of a data service interface is shortened, thereby improving the user experience.

Description

数据处理的方法和装置Method and device for data processing
相关申请的交叉引用Cross References to Related Applications
本申请要求享有2021年8月30日提交的申请号为202111005262.3的中国发明专利申请的优先权,其全部内容通过引用并入本文。This application claims priority to the Chinese patent application for invention with application number 202111005262.3 filed on August 30, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及大数据技术领域,尤其涉及一种数据处理的方法和装置。The present disclosure relates to the technical field of big data, and in particular to a data processing method and device.
背景技术Background technique
在互联网领域中,网站和不同移动端的应用程序通过数据服务接口查询数据,由于用户对的体验要求越来越高,对查询数据的要求也越来越高,既要求接口响应速度越来越来,又要求数据查询范围越来越广。In the Internet field, websites and different mobile applications query data through data service interfaces. As users have higher and higher requirements for user experience, the requirements for query data are also higher and higher, and the interface response speed is required to be faster and faster. , and the scope of data query is required to be wider and wider.
当用户选择任意组合的维度查询数据时,数据服务接口需要通过OLAP(Online Analytical Processing,联机分析处理)方式聚合数据,这种接口采用明细数据去聚合的方式,导致接口响应速度较慢,当高频次调用接口,且查询数据范围广时,数据处理效率较低,无法快速返回数据查询结果,用户体验较差。When the user selects any combination of dimensions to query data, the data service interface needs to aggregate data through OLAP (Online Analytical Processing, Online Analytical Processing). When the interface is called frequently and the query data range is wide, the data processing efficiency is low, the data query results cannot be returned quickly, and the user experience is poor.
发明内容Contents of the invention
有鉴于此,本公开实施例提供一种数据处理的方法和装置,通过对目标数据的获取请求中的查询入参进行解析,获得与查询入参对应的各个查询标识,首先从缓存获取查询标识对应的查询结果,如果缓存中不存在对应的查询结果,再从数据库中获取对应的查询结果,进而得到目标数据,提高了查询效率,提升了接口响应速度,进而提升了用户体验。In view of this, the embodiments of the present disclosure provide a data processing method and device. By analyzing the query input parameters in the acquisition request of the target data, each query identifier corresponding to the query input parameters is obtained. First, the query identifier is obtained from the cache For the corresponding query result, if there is no corresponding query result in the cache, the corresponding query result is obtained from the database, and then the target data is obtained, which improves the query efficiency, improves the interface response speed, and thus improves the user experience.
为实现上述目的,根据本公开实施例的一个方面,提供了一种数据处理的方法,包括:To achieve the above purpose, according to an aspect of the embodiments of the present disclosure, a data processing method is provided, including:
接收针对目标数据的获取请求,所述获取请求包括查询入参;receiving an acquisition request for target data, where the acquisition request includes query input parameters;
对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识;Analyzing the query input parameters to determine each query identifier corresponding to the query input parameters;
判断缓存中是否存在所述查询标识,若是,从所述缓存中获取所述查询标识对应的查询结果;若否,从数据库中获取所述查询标识对应的查询结果;Judging whether the query identifier exists in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
拼接所述各个查询标识的查询结果,得到所述目标数据。The query results of the respective query identifiers are spliced to obtain the target data.
可选地,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:Optionally, parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
从所述查询入参中解析出时间属性;Parsing out the time attribute from the query input parameter;
根据所述时间属性,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute, determine the number of return values to be acquired corresponding to the query input parameters;
根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
可选地,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:Optionally, parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
从所述查询入参中解析出时间属性和维度类型;所述维度类型包括分组维度;Analyzing time attributes and dimension types from the query input parameters; the dimension types include grouping dimensions;
根据所述时间属性和所述分组维度,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute and the grouping dimension, determine the number of return values to be acquired corresponding to the query input parameter;
根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
可选地,所述维度类型还包括排序维度,所述排序维度指示了排序字段和排序取值范围;Optionally, the dimension type further includes a sorting dimension, and the sorting dimension indicates a sorting field and a sorting value range;
拼接所述各个查询标识的查询结果,得到所述目标数据,包括:Splicing the query results of the various query identifiers to obtain the target data includes:
根据所述排序字段,对所述各个查询标识的查询结果进行排序,并根据所述排序取值范围,获取与所述排序取值范围对应的查询结果,以得到所述目标数据。sorting the query results of the respective query identifiers according to the sorting field, and obtaining the query results corresponding to the sorting value range according to the sorting value range, so as to obtain the target data.
可选地,根据所述时间属性和所述分组维度,确定所述查询入参对应的待获取的返回值的个数,包括:Optionally, according to the time attribute and the grouping dimension, determining the number of return values to be acquired corresponding to the query input parameters includes:
判断所述分组维度与已配置的分组维度是否匹配;judging whether the grouping dimension matches the configured grouping dimension;
若是,根据所述时间属性和所述分组维度的枚举值确定所述待获取的返回值的个数;若否,根据所述时间属性确定所述待获取的返回值的个数。If yes, determine the number of return values to be acquired according to the time attribute and the enumeration value of the grouping dimension; if not, determine the number of return values to be acquired according to the time attribute.
可选地,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:Optionally, parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
从所述查询入参中解析出时间属性和维度类型;所述维度类型包括限制维度,所述限制维度指示了枚举值;Parse the time attribute and dimension type from the query input parameter; the dimension type includes a restricted dimension, and the restricted dimension indicates an enumerated value;
根据所述时间属性和所述枚举值,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute and the enumerated value, determine the number of return values to be obtained corresponding to the query input parameter;
根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
可选地,根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识,包括:Optionally, according to the number of return values to be obtained, determining each query identifier corresponding to the query input parameter includes:
判断所述待获取的返回值的个数是否大于1;judging whether the number of return values to be obtained is greater than 1;
若否,确定与所述查询入参对应的一个查询标识;If not, determine a query identifier corresponding to the query input parameter;
若是,判断所述缓存中所述待获取的返回值的个数是否不小于预设阈值,若是,对所述查询入参进行拆解,确定与所述查询入参对应的各个查询标识,若否,不进行拆解,确定与所述查询入参对应的一个查询标识。If yes, determine whether the number of return values to be obtained in the cache is not less than a preset threshold, if yes, disassemble the query input parameters, and determine each query identifier corresponding to the query input parameters, if No, disassembly is not performed, and a query identifier corresponding to the query input parameter is determined.
可选地,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,还包括:Optionally, parsing the query input parameters to determine each query identifier corresponding to the query input parameters further includes:
从所述查询入参解析出时间属性,所述时间属性包括开始时间、结束时间和时间粒度,Analyzing the time attribute from the query input parameter, the time attribute includes start time, end time and time granularity,
根据所述开始时间、结束时间和时间粒度,异步生成与所述查询入参对应的新时间查询标识;Asynchronously generate a new time query identifier corresponding to the query input parameter according to the start time, end time and time granularity;
其中,所述新时间查询标识的结束时间为所述查询入参的结束时间与所述时间粒度的加和。Wherein, the end time of the new time query identifier is the sum of the end time of the query input parameters and the time granularity.
可选地,从数据库中获取所述查询标识对应的查询结果之后,还包括:Optionally, after obtaining the query result corresponding to the query identifier from the database, further include:
判断所述查询结果中返回值的个数是否大于1;Judging whether the number of returned values in the query result is greater than 1;
若是,将所述查询结果和所述查询结果对应的查询标识对应存储在缓存中;If so, correspondingly storing the query result and the query identifier corresponding to the query result in the cache;
若否,根据所述查询结果中返回值的个数,对所述查询结果与所述查询结果对应的查询标识进行拆解,获得子查询标识和所述子查询标识对应的返回值,并将所述子查询标识和所述子查询标识对应的返回值对应存储在所述缓存中。If not, according to the number of return values in the query result, the query result and the query identifier corresponding to the query result are disassembled to obtain the subquery identifier and the return value corresponding to the subquery identifier, and The sub-query identifier and the return value corresponding to the sub-query identifier are correspondingly stored in the cache.
本公开实施例的另一方面提供一种数据处理的装置,包括:Another aspect of the embodiments of the present disclosure provides a data processing device, including:
接收模块,接收针对目标数据的获取请求,所述获取请求包括查询入参;The receiving module receives an acquisition request for target data, and the acquisition request includes query input parameters;
确定模块,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识;The determining module is configured to analyze the query input parameters, and determine each query identifier corresponding to the query input parameters;
判断模块,判断缓存中是否存在所述查询标识,若是,从所述缓存中获取所述查询标识对应的查询结果;若否,从数据库中获取所述查询标识对应的查询结果;A judging module, judging whether the query identifier exists in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
拼接模块,拼接所述各个查询标识的查询结果,得到所述目标数据。The splicing module splices the query results of the respective query identifiers to obtain the target data.
根据本公开实施例的另一个方面,提供了一种电子设备,包括:According to another aspect of the embodiments of the present disclosure, an electronic device is provided, including:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开提供的数据处理的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method provided in the present disclosure.
根据本公开实施例的还一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开提供的数据处理的方法。According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the data processing method provided in the present disclosure is implemented.
上述公开中的一个实施例具有如下优点或有益效果:通过对目标数据的获取请求中的查询入参进行解析,获得与查询入参对应的各个查询标识,首先从缓存获取查询标识对应的查询结果,如果缓存中不存在对应的查询结果,再从数据库中获取对应的查询结果,进而得到目标数据,提高了查询效率,提升了接口响应速度,进而提升了用户体验。An embodiment in the above disclosure has the following advantages or beneficial effects: by analyzing the query input parameters in the acquisition request of the target data, each query identifier corresponding to the query input parameters is obtained, and firstly, the query result corresponding to the query identifier is obtained from the cache , if the corresponding query result does not exist in the cache, the corresponding query result is obtained from the database, and then the target data is obtained, which improves the query efficiency, improves the interface response speed, and thus improves the user experience.
上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.
附图说明Description of drawings
附图用于更好地理解本公开,不构成对本公开的不当限定。其中:The accompanying drawings are for better understanding of the present disclosure, and do not constitute an improper limitation of the present disclosure. in:
图1是根据本公开实施例的一种数据处理的方法的主要流程的示意图;FIG. 1 is a schematic diagram of the main flow of a data processing method according to an embodiment of the present disclosure;
图2是本公开实施例的另一种数据处理的方法的主要流程的示意图;FIG. 2 is a schematic diagram of the main flow of another data processing method according to an embodiment of the present disclosure;
图3是根据本公开实施例的再一种数据处理的方法的主要流程的示意图;FIG. 3 is a schematic diagram of the main flow of another data processing method according to an embodiment of the present disclosure;
图4是本公开实施例的一种数据处理的方法的流程示意图;FIG. 4 is a schematic flowchart of a data processing method according to an embodiment of the present disclosure;
图5是根据本公开实施例的一种数据处理的装置的主要模块的示 意图;Fig. 5 is a schematic diagram of the main modules of a device for data processing according to an embodiment of the present disclosure;
图6是本公开实施例可以应用于其中的示例性系统架构图;FIG. 6 is an exemplary system architecture diagram to which embodiments of the present disclosure can be applied;
图7是适于用来实现本公开实施例的终端设备或服务器的计算机系统的结构示意图。Fig. 7 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
现有数据服务接口一般设置有基于Key-Value存储系统的缓存,当调用方通过数据服务接口查询数据时,数据服务接口根据接口查询入参或生成的sql生成一个Key,并将Key和返回的Value存储在缓存中,以便下次查询缓存。例如,调用方A请求数据服务接口获取2021-05-01~2021-05-02每天的订单量,缓存根据这个请求,生成一个Key=“order_2021-05-01_2021-05-02”,Value={“2021-05-01”:100,“2021-05-02”:200},调用方B请求获取2021-05-01~2021-05-03的每天的订单量,则无法命中这个Key=“order_2021-05-01_2021-05-02”,需要再次查数据库。但是这种方式会导致缓存中存储的数据粒度较大,缓存命中率不高,缓存中数据的粒度相当于由接口调用方决定,不同的调用方如果数据范围和数据粒度不同,则无法共享缓存,导致缓存命中率低,数据查询效率较低,接口响应时间长,降低了用户体验。The existing data service interface is generally equipped with a cache based on the Key-Value storage system. When the caller queries data through the data service interface, the data service interface generates a Key according to the interface query input parameters or generated sql, and combines the Key and the returned Value is stored in the cache so that the cache can be queried next time. For example, caller A requests the data service interface to obtain the daily order volume from 2021-05-01 to 2021-05-02, and the cache generates a Key="order_2021-05-01_2021-05-02" and Value={ "2021-05-01": 100, "2021-05-02": 200}, the caller B requests to obtain the daily order volume from 2021-05-01 to 2021-05-03, but this Key=" order_2021-05-01_2021-05-02", you need to check the database again. However, this method will result in a large granularity of data stored in the cache, and a low cache hit rate. The granularity of the data in the cache is determined by the interface caller. Different callers cannot share the cache if the data range and data granularity are different. , resulting in low cache hit rate, low data query efficiency, long interface response time, and reduced user experience.
针对上述问题,本公开实施例提供了一种数据处理的方法,能够提升缓存命中率,提升数据查询效率,缩短接口响应时间,提升接口性能。In view of the above problems, the embodiments of the present disclosure provide a data processing method, which can improve cache hit rate, improve data query efficiency, shorten interface response time, and improve interface performance.
图1是根据本公开实施例的一种数据处理的方法的主要流程的示 意图,如图1所示,该数据处理的方法,应用于服务端,包括以下步骤:Fig. 1 is a schematic diagram of the main flow of a method for data processing according to an embodiment of the present disclosure. As shown in Fig. 1, the method for data processing is applied to the server and includes the following steps:
步骤S101:接收针对目标数据的获取请求,获取请求包括查询入参;Step S101: receiving an acquisition request for target data, where the acquisition request includes query input parameters;
步骤S102:对查询入参进行解析,确定与查询入参对应的各个查询标识;Step S102: Analyzing the query input parameters, and determining each query identifier corresponding to the query input parameters;
步骤S103:判断缓存中是否存在查询标识,若是,执行步骤S104,若否,执行步骤S105;Step S103: Determine whether there is a query identifier in the cache, if yes, execute step S104, if not, execute step S105;
步骤S104:从缓存中获取查询标识对应的查询结果;Step S104: Obtain the query result corresponding to the query identifier from the cache;
步骤S105:从数据库中获取查询标识对应的查询结果;Step S105: Obtain the query result corresponding to the query identifier from the database;
步骤S106:拼接各个查询标识的查询结果,得到目标数据。Step S106: Concatenate the query results of each query identifier to obtain the target data.
在本公开的实施例中,调用方通过数据服务接口向服务提供方发送目标数据的获取请求,以获取目标数据。其中,调用方为调用数据服务接口的一方,服务提供方位被调用的提供目标数据的一方。服务提供方可以为服务端,调用方可以为客户端。获取请求中包括的查询入参,服务提供方可以根据查询入参,确定目标数据并返回给调用方。In the embodiment of the present disclosure, the caller sends a target data acquisition request to the service provider through the data service interface, so as to acquire the target data. Wherein, the caller is the party that calls the data service interface, and the service provider is the called party that provides the target data. The service provider can be the server, and the caller can be the client. Obtain the query input parameters included in the request, and the service provider can determine the target data according to the query input parameters and return it to the caller.
在本公开的实施例中,从查询入参中可以解析出请求调用方标识、调用的接口的标识以及目标数据的标识或指标,也可以解析出渠道标识。其中,调用的接口的标识指示了调用的接口,调用方标识用于验证调用方是否具有目标数据的获取权限,可以通过判断调用方的标识是否在白名单内来验证其获取权限,如果验证通过,则可以响应获取请求,查询并返回目标数据,如果验证失败,则拒绝响应获取请求。目标数据的标识或指标为获取请求所请求获取的目标数据的标识,如累计订单量的标识。渠道标识为调用方调用数据服务接口的渠道的标识,如为市场渠道,则可以解析出市场渠道标识。可选地,还可以解析出业务内容,即获取的目标数据的业务范围。In the embodiments of the present disclosure, the identifier of the requesting party, the identifier of the called interface, and the identifier or index of the target data can be parsed out from the query input parameters, and the channel identifier can also be parsed out. Among them, the identifier of the called interface indicates the called interface, and the caller’s identifier is used to verify whether the caller has the permission to obtain the target data. It can be verified by judging whether the caller’s identifier is in the white list. If the verification passes , you can respond to the fetch request, query and return the target data, and refuse to respond to the fetch request if the verification fails. The identifier or indicator of the target data is the identifier of the target data requested by the acquisition request, such as the identifier of the accumulated order quantity. The channel ID is the ID of the channel through which the caller invokes the data service interface. If it is a market channel, the market channel ID can be parsed out. Optionally, the business content may also be parsed out, that is, the business scope of the acquired target data.
在本公开的实施例的一种实施方式中,步骤S102,进一步包括: 从查询入参中解析出时间属性;根据时间属性,确定查询入参对应的待获取的返回值的个数;根据待获取的返回值的个数,确定与查询入参对应的各个查询标识。In an implementation manner of the embodiments of the present disclosure, step S102 further includes: parsing the time attribute from the query input parameter; determining the number of return values to be obtained corresponding to the query input parameter according to the time attribute; The number of returned values obtained determines the respective query identifiers corresponding to the query input parameters.
在本公开的实施例,从查询入参中可以解析出时间属性,时间属性包括开始时间、结束时间和时间粒度。开始时间和结束时间确定了目标数据的查询的时间范围,时间粒度可以为时间间隔,如1天、1小时、10分钟、1分钟等。则根据时间属性,可以确定待获取的返回值的个数。例如,调用方A请求获取2021-05-01每小时的订单量,从查询入参中解析出,开始时间2021-05-01 00:00:00,结束时间2021-05-01 23:59:59,时间粒度为1小时,则待获取的返回值的个数为12。In the embodiment of the present disclosure, the time attribute can be parsed from the query input parameter, and the time attribute includes start time, end time and time granularity. The start time and end time determine the time range for querying the target data, and the time granularity can be a time interval, such as 1 day, 1 hour, 10 minutes, 1 minute, and so on. Then, according to the time attribute, the number of return values to be acquired can be determined. For example, caller A requests to obtain the order quantity per hour on 2021-05-01, and parses it from the query input parameters. The start time is 2021-05-01 00:00:00 and the end time is 2021-05-01 23:59: 59. If the time granularity is 1 hour, then the number of return values to be acquired is 12.
根据业务需求,当调用方采用业务相关的维度获取目标数据时,可以从查询入参中解析出维度类型,维度类型包括分组维度、限制维度和排序维度等中的至少一种。According to business requirements, when the caller uses a business-related dimension to obtain target data, it can parse out the dimension type from the query input parameter, and the dimension type includes at least one of grouping dimension, restriction dimension and sorting dimension.
分组维度指示了分组字段,用于指明分组的维度,例如分组维度可以为平台维度、部门维度、类目维度等可以实现分组的维度,平台维度指明了按照平台分组获取目标数据,部门维度指明了按照部门分组的方式获取目标数据,类目维度指明了按照类目分组的方式获取目标数据。进一步解析可以获得分组维度对应的枚举值,如平台维度的枚举值包括所有的平台;二级部门维度的枚举值为所有的二级部门;类目维度的枚举值为所有的类目。限制维度限制了枚举值,如限制维度可以为获取某个平台、部门、省份等的目标数据。排序维度指明了对目标数据的按照某个指标的顺序排序,并可以指明排序后获取的目标数据的范围,例如,排序维度可以指明获取订单量排名前十的数据。The grouping dimension indicates the grouping field, which is used to indicate the dimension of grouping. For example, the grouping dimension can be platform dimension, department dimension, category dimension and other dimensions that can realize grouping. The platform dimension indicates the target data obtained by platform grouping, and the department dimension indicates The target data is obtained by grouping by department, and the category dimension indicates that the target data is obtained by grouping by category. Further analysis can obtain the enumeration value corresponding to the grouping dimension. For example, the enumeration value of the platform dimension includes all platforms; the enumeration value of the second-level department dimension is all the second-level departments; the enumeration value of the category dimension is all classes head. The restricted dimension restricts the enumeration value. For example, the restricted dimension can be to obtain the target data of a certain platform, department, province, etc. The sorting dimension indicates the order in which the target data is sorted according to a certain indicator, and can indicate the range of the target data obtained after sorting. For example, the sorting dimension can indicate the top ten data obtained by order volume.
在本公开的实施例中,分组维度、限制维度和排序维度可以任意组合,使得数据查询范围更广,以提升接口性能,提升用户体验。In the embodiments of the present disclosure, the grouping dimension, the restricting dimension and the sorting dimension can be combined arbitrarily, so that the scope of data query is wider, so as to improve interface performance and user experience.
在本公开的实施例的另一种实施方式中,如图2所示,步骤S102,进一步包括:In another implementation manner of the embodiment of the present disclosure, as shown in FIG. 2, step S102 further includes:
步骤S201:从查询入参中解析出时间属性和维度类型;维度类型包括分组维度;Step S201: Parse the time attribute and dimension type from the query input parameter; the dimension type includes grouping dimension;
步骤S202:根据时间属性和分组维度,确定查询入参对应的待获取的返回值的个数;Step S202: According to the time attribute and the grouping dimension, determine the number of return values to be acquired corresponding to the query input parameters;
步骤S203:根据待获取的返回值的个数,确定与查询入参对应的各个查询标识。Step S203: According to the number of return values to be obtained, determine each query identifier corresponding to the query input parameters.
可选地,步骤S202,进一步包括:判断所述分组维度与已配置的分组维度是否匹配;若是,根据所述时间属性和所述分组维度的枚举值确定所述待获取的返回值的个数;若否,根据所述时间属性确定所述待获取的返回值的个数。Optionally, step S202 further includes: judging whether the grouping dimension matches the configured grouping dimension; if so, determining the number of return values to be obtained according to the time attribute and the enumeration value of the grouping dimension number; if not, determine the number of return values to be obtained according to the time attribute.
在本公开的实施例中,查询标识可以为关键字、基于Key-Value(简称K-V)存储系统中的Key,或其他的可以从缓存或数据库中获取查询结果的查询参数。进一步可选地,查询标识为Key,根据查询入参,可以确定与查询入参对应的一个或多个Key,查询结果为Value。In the embodiments of the present disclosure, the query identifier may be a keyword, a Key in a Key-Value (K-V for short) storage system, or other query parameters that can obtain query results from a cache or a database. Further optionally, the query identifier is Key, and one or more Keys corresponding to the query input parameters can be determined according to the query input parameters, and the query result is Value.
从查询入参中解析出时间属性和维度类型,时间属性包括开始时间、结束时间和时间粒度,当维度类型为分组维度时,分组维度指示了分组字段,根据分组字段对分组维度进行解析,判断分组维度与已配置的分组维度是否匹配,如果是,即查询入参中的分组维度为已配置的分组维度,则获取该分组维度的枚举值,并根据时间属性和分组维度的枚举值,来确定查询入参的待获取的返回值的个数,如果否,即未配置该分组维度,则根据时间属性确定待获取的返回值的个数。通过判断分组维度与已配置分组维度是否匹配,可以确定是否可以按照分组维度进行拆解。例如,从查询入参中解析出了分组维度为二级部门字段,如果已配置了二级部门的分组维度该部门维度,则可以匹配,然后获取已配置的二级部门维度的枚举值(包括部门b1、部门b2 和部门b3),根据时间属性和枚举值可以确定待获取的返回值的个数,如果未配置二级部门的分组维度,则根据时间属性确定待获取的返回值的个数。待获取的返回值的为查询入参查询或期望获取的返回值的个数,如从查询入参解析出开始时间=2021-05-01 00:00:00,结束时间=2021-05-02 23:59:59,时间粒度为1天,分组字段为二级部门,即查询所有二级部门在2021-05-01和2021-05-02的每天的订单量,如二级部门有a和b两个,则查询二级部门a分别在2021-05-01和2021-05-02每天的订单量,以及二级部门b分别在2021-05-01和2021-05-02每天的订单量,则待获取的返回值的个数为4个,与订单量的值是否为空无关。Parse the time attribute and dimension type from the query input parameters. The time attribute includes start time, end time, and time granularity. When the dimension type is a grouping dimension, the grouping dimension indicates the grouping field. Analyze the grouping dimension according to the grouping field and judge Whether the grouping dimension matches the configured grouping dimension, if yes, that is, the grouping dimension in the query input parameter is the configured grouping dimension, then get the enumeration value of the grouping dimension, and according to the time attribute and the enumeration value of the grouping dimension , to determine the number of return values to be obtained for the query input parameter. If not, that is, the grouping dimension is not configured, then determine the number of return values to be obtained according to the time attribute. By judging whether the grouping dimension matches the configured grouping dimension, it can be determined whether it can be disassembled according to the grouping dimension. For example, the grouping dimension is parsed as a second-level department field from the query input parameters. If the grouping dimension of the second-level department has been configured, the department dimension can be matched, and then the enumerated value of the configured second-level department dimension can be obtained ( Including department b1, department b2 and department b3), the number of return values to be obtained can be determined according to the time attribute and enumeration value. If the grouping dimension of the secondary department is not configured, the number of return values to be obtained can be determined according to the time attribute number. The return value to be obtained is the number of query input parameters or expected return values. For example, the start time = 2021-05-01 00:00:00 and the end time = 2021-05-02 from the query input parameters 23:59:59, the time granularity is 1 day, and the grouping field is the second-level department, that is, query the daily order volume of all second-level departments on 2021-05-01 and 2021-05-02. For example, the second-level department has a and b two, query the order volume of secondary department a on 2021-05-01 and 2021-05-02 respectively, and the order volume of secondary department b on 2021-05-01 and 2021-05-02 respectively , the number of return values to be obtained is 4, regardless of whether the value of the order quantity is empty or not.
在本公开的实施例的再一种实施方式中,步骤S102,进一步包括:In yet another implementation manner of the embodiment of the present disclosure, step S102 further includes:
从查询入参中解析出时间属性和维度类型;维度类型包括限制维度,限制维度指示了枚举值;Parse the time attribute and dimension type from the query input parameter; the dimension type includes the restricted dimension, and the restricted dimension indicates the enumerated value;
根据时间属性和枚举值,确定查询入参对应的待获取的返回值的个数;Determine the number of return values to be obtained corresponding to the query input parameters according to the time attribute and the enumeration value;
根据待获取的返回值的个数,确定与查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
当从查询入参中解析出时间属性和限制维度,根据时间属性和枚举值,可以确定待获取的返回值的个数。如从查询入参解析出开始时间=2021-05-01 00:00:00,结束时间=2021-05-02 23:59:59,时间粒度为1天,限制维度指示了枚举值为二级部门a,则查询二级部门a分别在2021-05-01和2021-05-02每天的订单量,待获取的返回值的个数为2。When the time attribute and limit dimension are parsed from the query input parameters, the number of return values to be obtained can be determined according to the time attribute and enumeration value. For example, the start time = 2021-05-01 00:00:00, the end time = 2021-05-02 23:59:59, the time granularity is 1 day, and the restriction dimension indicates that the enumeration value is two Level-level department a, query the order volume of level-level department a on 2021-05-01 and 2021-05-02 respectively, and the number of return values to be obtained is 2.
在本公开的实施例中,如图3所示,根据待获取的返回值的个数,确定与查询入参对应的各个查询标识,包括:In the embodiment of the present disclosure, as shown in FIG. 3 , according to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined, including:
步骤S301:判断待获取的返回值的个数是否大于1,若否,执行步骤S302,若是,执行步骤S303;Step S301: Determine whether the number of return values to be acquired is greater than 1, if not, execute step S302, if yes, execute step S303;
步骤S302:确定与查询入参对应的一个查询标识;Step S302: Determine a query identifier corresponding to the query input parameter;
步骤S303:判断缓存中待获取的返回值的个数是否不小于预设阈值,若是,执行步骤S304,若否,执行步骤S305;Step S303: Determine whether the number of return values to be acquired in the cache is not less than the preset threshold, if yes, execute step S304, if not, execute step S305;
步骤S304:对查询入参进行拆解,确定与查询入参对应的各个查询标识;Step S304: disassemble the query input parameters, and determine each query identifier corresponding to the query input parameters;
步骤S305:不对查询入参进行拆解,确定与查询入参对应的一个查询标识。Step S305: Determine a query identifier corresponding to the query input parameter without disassembling the query input parameter.
当确定待获取的返回值的个数后,判断待获取的返回值的个数是否大于1,如果待获取的返回值的个数为1,说明获取查询查询的是一个点的数据的,返回的查询结果是一个值,则不需要对查询入参进行拆解,根据查询入参确定一个查询标识即可。例如,平台=平台a,一级部门=部门b,开始时间=2020-01-01 00:00:00,结束时间=2020-01-01 00:59:59,指标=累计指标v_1。即查询平台a,部门b,在2020-01-01 00:00:00~2020-01-01 00:59:59之间的累计指标v_1,返回的是一个值,则不需要进行拆解,确定一个查询标识即可。After determining the number of return values to be obtained, judge whether the number of return values to be obtained is greater than 1. If the number of return values to be obtained is 1, it means that the data obtained by the query is a point. Return If the query result is a value, there is no need to disassemble the query input parameter, and a query identifier can be determined according to the query input parameter. For example, platform = platform a, first-level department = department b, start time = 2020-01-01 00:00:00, end time = 2020-01-01 00:59:59, indicator = cumulative indicator v_1. That is, query platform a, department b, the cumulative index v_1 between 2020-01-01 00:00:00 ~ 2020-01-01 00:59:59, and return a value, so there is no need to disassemble. Just determine a query ID.
本公开的实施例中,拆解为将查询入参拆解为多个查询标识,根据查询标识从缓存中和/或数据库中获取查询标识对应的查询结果,因为从缓存中查询数据的时间是远小于从数据库中查询数据的时间的,所以通过拆解为多个查询标识,进行查询,先从缓存中查询再从数据库中查询,可以提升查询的效率。In the embodiments of the present disclosure, the disassembly is to disassemble the query input parameters into multiple query identifiers, and obtain the query results corresponding to the query identifiers from the cache and/or database according to the query identifiers, because the time to query data from the cache is It is much shorter than the time to query data from the database, so by dismantling it into multiple query identifiers, querying, first querying from the cache and then querying from the database, the query efficiency can be improved.
在本公开的实施例的一种实施方式中,缓存可以为K-V缓存,如Redis缓存,数据库可以为Mysql数据库,也可以为其他的缓存或数据库。In an implementation manner of the embodiments of the present disclosure, the cache may be a K-V cache, such as a Redis cache, and the database may be a Mysql database, or other caches or databases.
如果待获取的返回值的个数大于1,则判断缓存中待获取的返回值的个数是否不小于预设阈值,如果是,则进行拆解,否则,不进行拆解。其中,预设阈值是查询入参对应的待获取的返回值的个数与系统阈值确定的,其中,系数阈值是根据经验和日志分析估计的一个0~1 之间的值(如0.6)。设查询入参对应的待获取的返回值的个数为N,缓存中待获取的返回值的个数为M,系数阈值为r,如果M≥r*N,则可以进行拆解,否则,不进行拆解,以保证较短的数据查询时间。因为从缓存中查询获取数据的时间远小于从数据库中查询获取数据的时间。如果从数据库查询多个值需要的时间为ts,拆解以后,部分值命中缓存,查询缓存的时间为ts1(远小于ts),剩下的值从数据库中一次查询,查询时间为ts2,将查询获得的多个值拼接,拼接时间为ts3,如果ts2+ts3<ts,则进行拆解,也值得拆解,否则不进行拆解,不值得拆解。即越多的值从缓存中查,则查询时间越短,数据服务接口性能越好。If the number of return values to be obtained is greater than 1, it is judged whether the number of return values to be obtained in the cache is not less than a preset threshold, and if so, dismantling is performed, otherwise, disassembly is not performed. Among them, the preset threshold is determined by the number of return values to be obtained corresponding to the query input parameters and the system threshold, and the coefficient threshold is a value between 0 and 1 estimated based on experience and log analysis (such as 0.6). Assume that the number of return values to be obtained corresponding to the query input parameter is N, the number of return values to be obtained in the cache is M, and the coefficient threshold is r. If M≥r*N, disassembly can be performed; otherwise, No disassembly is performed to ensure a short data query time. Because the time to query and obtain data from the cache is much shorter than the time to query and obtain data from the database. If the time required to query multiple values from the database is ts, after disassembly, some values hit the cache, and the query cache time is ts1 (much less than ts), and the remaining values are queried from the database once, and the query time is ts2, which will be Multiple values obtained by the query are spliced, and the splicing time is ts3. If ts2+ts3<ts, then it is worth dismantling. Otherwise, it is not worth dismantling. That is, the more values are checked from the cache, the shorter the query time and the better the performance of the data service interface.
在本公开的实施例的一种实施方式中,对查询入参进行拆解,包括:根据时间属性对查询入参进行拆解。根据开始时间、结束时间和时间粒度,对查询入参拆解。针对查一系列时间趋势的数据,例如,从查询入参解析出,平台=平台a,一级部门=部门b,开始时间=2020-01-01 00:00:00,结束时间=2020-01-01 23:59:59,时间间隔=1小时,指标=累计指标v_1。即查询平台a,部门b,在2020-01-01 00:00:00~2020-01-01 23:59:59之间,0-1点的累计值,0-2点的累计值,0-3点的累计值……0-24点的累计值,返回的是24个值,并且这些值按照时间次序排序以后是一个连续的时间趋势,则可以根据时间粒度进行拆解,从而可以将该获取请求拆解为24个查询请求分别进行查询,可以确定24个查询标识,来分别查询0-1点的累计值,0-2点的累计值,0-3点的累计值……0-24点的累计值。如果24个查询标识有x个在缓存中,则从缓存中查询x个查询标识的查询结果,剩余的24-x个查询标识的查询结果通过查询数据库的方式获取。In an implementation manner of the embodiments of the present disclosure, disassembling the query input parameters includes: disassembling the query input parameters according to the time attribute. Disassemble the query input parameters according to the start time, end time and time granularity. For querying data of a series of time trends, for example, from the analysis of query input parameters, platform = platform a, first-level department = department b, start time = 2020-01-01 00:00:00, end time = 2020-01 -01 23:59:59, time interval = 1 hour, indicator = cumulative indicator v_1. That is to query platform a, department b, between 2020-01-01 00:00:00~2020-01-01 23:59:59, the cumulative value of 0-1 point, the cumulative value of 0-2 point, 0 The cumulative value of -3 points... the cumulative value of 0-24 points, 24 values are returned, and these values are sorted in chronological order and form a continuous time trend, which can be disassembled according to the time granularity, so that the The acquisition request is disassembled into 24 query requests to query separately, and 24 query identifiers can be determined to query the cumulative value of 0-1 point, the cumulative value of 0-2 point, the cumulative value of 0-3 point... 0 -24 point cumulative value. If x of the 24 query identifiers are in the cache, the query results of the x query identifiers are queried from the cache, and the query results of the remaining 24-x query identifiers are obtained by querying the database.
在本公开的实施例的另一种实施方式中,对查询入参进行拆解,包括:根据时间属性和维度类型对查询入参进行拆解。当维度类型为分组维度时,可以根据时间粒度和分组维度进行拆解。例如,从查询入参可以解析出,平台=平台a,一级部门=部门b,开始时间=2020-01-01  00:00:00,结束时间=2020-01-01 00:59:59,时间粒度=1天,分组维度=二级部门字段,指标=累计指标v_1。即查询平台a,一级部门b下面所有的二级部门,在2020-01-01 00:00:00~2020-01-01 00:59:59之间的累计数据。如二级部门包括b1、b2和b3,则将该获取请求拆解为3个查询请求分别查询,可以确定3个查询标识,分别为二级部门b1的累计数据、二级部门b2的累计数据和二级部门b3的累计数据。In another implementation manner of the embodiments of the present disclosure, disassembling the query input parameters includes: disassembling the query input parameters according to time attributes and dimension types. When the dimension type is a grouping dimension, it can be disassembled according to the time granularity and grouping dimension. For example, it can be parsed from the query input parameters, platform = platform a, first-level department = department b, start time = 2020-01-01 00:00:00, end time = 2020-01-01 00:59:59, Time granularity = 1 day, grouping dimension = second-level department field, index = cumulative index v_1. That is to query platform a and all the second-level departments under the first-level department b, the accumulated data between 2020-01-01 00:00:00~2020-01-01 00:59:59. If the second-level department includes b1, b2, and b3, the acquisition request is disassembled into three query requests to be queried separately, and three query identifiers can be determined, which are the accumulated data of the second-level department b1 and the accumulated data of the second-level department b2 and cumulative data for secondary sector b3.
在本公开的实施例中,确定与查询入参对应各个查询标识,包括:确定对应的查询标识的形态。当确定查询标识后,根据查询入参,确定查询标识的形态,使得查询标识的形态符合缓存和/或数据库的形态,以可以快速获得查询结果。例如,若24个Key中,有x个Key在K-V缓存中,则将x个Key的形态转换为K-V缓存的形态,例如,Key=时间缩写_时间粒度_平台_一级部门_二级部门,若平台=平台a,一级部门=部门b,开始时间=2020-01-01 00:00:00,结束时间=2020-01-01 00:59:59,指标=累计指标v_1。Key=“2020010101_h_a_b__”,“h”代表时间粒度是小时,2020010101代表了2020-01-01的第一个小时。In an embodiment of the present disclosure, determining each query identifier corresponding to the query input parameter includes: determining the form of the corresponding query identifier. After the query identifier is determined, the form of the query identifier is determined according to the query input parameters, so that the form of the query identifier conforms to the form of the cache and/or database, so that the query result can be quickly obtained. For example, if among the 24 keys, there are x keys in the K-V cache, then convert the form of the x keys into the form of the K-V cache, for example, Key=time abbreviation_time granularity_platform_first-level department_second-level department , if platform = platform a, first-level department = department b, start time = 2020-01-01 00:00:00, end time = 2020-01-01 00:59:59, index = cumulative index v_1. Key="2020010101_h_a_b__", "h" means the time granularity is hours, and 2020010101 means the first hour of 2020-01-01.
当确定查询入参对应的各个查询标识后,首先判断缓存中是否存在查询标识,如果存在,则从缓存中获取该查询标识的查询结果,如果不存在,则从数据库中获取查询标识对应的查询结果,从而可以确定哪些查询标识可以从缓存中查询,哪些查询标识可以从数据库中查询,优先确定可以从缓存中查询的查询标识,剩下的查询标识再一次从数据库中查询,从而可以大大降低数据的查询时间,提升查询效率,缩短接口响应时间,提升接口吞吐量,提高接口抗并发能力,提升接口性能。例如,若确定出24个Key中,有x个Key在K-V缓存中,则从K-V缓存中获取x个Key分别对应的Value,然后从数据库中获取剩余的24-x个Key分别对应的Value。After determining each query ID corresponding to the query input parameter, first determine whether there is a query ID in the cache, if it exists, obtain the query result of the query ID from the cache, if not, obtain the query corresponding to the query ID from the database As a result, it can be determined which query IDs can be queried from the cache, which query IDs can be queried from the database, and the query IDs that can be queried from the cache can be determined first, and the remaining query IDs can be queried from the database again, which can greatly reduce Data query time improves query efficiency, shortens interface response time, improves interface throughput, improves interface anti-concurrency capability, and improves interface performance. For example, if it is determined that among the 24 keys, x keys are in the K-V cache, the values corresponding to the x keys are obtained from the K-V cache, and then the values corresponding to the remaining 24-x keys are obtained from the database.
例如,调用方请求获取2021.5.1~2020.5.3的订单数据,时间粒度是天,则会将该获取请求拆成3个请求,分别为获取2021.5.1的订单, 获取2021.5.2的订单和获取2020.5.3的订单。现有技术的缓存是存储一个key=order_d_2021-05-01_2021-05-03,如果次调用方需要获取2021.5.1~2020.5.4的每天的订单数据,不做拆解的话,是无法命中缓存的,需要去数据库中获取4天的订单数据。而本公开的实施例将获取请求拆解以后,缓存中是存储3个key,分别是order_d_2021-05-01_2021-05-01,order_d_2021-05-02_2021-05-02和order_d_2021-05-03_2021-05-03。当调用方请求获取2021.5.1~2020.5.4的订单数据时,先在缓存中查order_d_2021-05-01_2021-05-01,order_d_2021-05-02_2021-05-02,order_d_2021-05-03_2021-05-03,order_d_2021-05-04_2021-05-04,其中3个key命中,则只需从数据库中查order_d_2021-05-04_2021-05-04。For example, if the caller requests to obtain order data from 2021.5.1 to 2020.5.3, and the time granularity is days, the request will be split into three requests, which are to obtain orders from 2021.5.1, to obtain orders from 2021.5.2 and Get orders for 2020.5.3. The cache of the existing technology is to store a key=order_d_2021-05-01_2021-05-03. If the caller needs to obtain the daily order data from 2021.5.1 to 2020.5.4, the cache cannot be hit without disassembly , need to go to the database to get 4 days of order data. In the embodiment of the present disclosure, after dismantling the acquisition request, three keys are stored in the cache, namely order_d_2021-05-01_2021-05-01, order_d_2021-05-02_2021-05-02 and order_d_2021-05-03_2021-05 -03. When the caller requests to obtain the order data from 2021.5.1 to 2020.5.4, first check order_d_2021-05-01_2021-05-01, order_d_2021-05-02_2021-05-02, order_d_2021-05-03_2021-05- in the cache 03, order_d_2021-05-04_2021-05-04, where 3 keys are hit, you only need to check order_d_2021-05-04_2021-05-04 from the database.
再如,一级部门a,是由二级部门b1,二级部门b2,二级b3组成,那么a可以拆解为b1,b2,b3。For another example, the first-level department a is composed of the second-level department b1, the second-level department b2, and the second-level b3, then a can be disassembled into b1, b2, and b3.
现有技术:调用方第一次请求2021.5.12的二级部门b1订单量,则生成key=order_d_20210512_b1,缓存中存储了K=order_d_20210512_b1,V={“order”:100}。调用方下次请求2021.5.12的一级部门a下面所有二级部门的订单数据,不进行拆解,会先在缓存中寻找K=order_d_20210512_a,未找到就会查询数据库。Existing technology: the caller requests the order quantity of the secondary department b1 on May 12, 2021 for the first time, and generates key=order_d_20210512_b1, and stores K=order_d_20210512_b1, V={"order":100} in the cache. The next time the caller requests the order data of all the second-level departments under the first-level department a on May 12, 2021, without dismantling, it will first search for K=order_d_20210512_a in the cache, and will query the database if it is not found.
本公开的实施例:调用方第一次请求2021.5.12的二级部门b1订单量,则生成key,order_d_20210512_b1,缓存中存储了K=order_d_20210512_b1,V={“order”:100}。调用方下次请求2021.5.12的一级部门a下面所有二级部门的订单数据,会拆解成查询2021.5.12的二级部门b1、b2和b3的订单数据,也就是查询order_d_20210512_b1,order_d_20210512_b2,order_d_20210512_b3,先查询缓存,其中order_d_20210512_b1可以命中缓存,order_d_20210512_b2,order_d_20210512_b3在缓存中查不到,再查询数据库。本公开的实施例相对于现有技术,提高了缓存命中率,减少了查询数据量的数据量。Embodiment of the present disclosure: the caller requests the order quantity of the secondary department b1 on May 12, 2021 for the first time, then generates the key, order_d_20210512_b1, and stores K=order_d_20210512_b1, V={"order": 100} in the cache. The next time the caller requests the order data of all the second-level departments under the first-level department a on 2021.5.12, it will be disassembled to query the order data of the second-level departments b1, b2, and b3 on 2021.5.12, that is, query order_d_20210512_b1, order_d_20210512_b2, order_d_20210512_b3, first query the cache, where order_d_20210512_b1 can hit the cache, order_d_20210512_b2, order_d_20210512_b3 cannot be found in the cache, and then query the database. Compared with the prior art, the embodiment of the present disclosure improves the cache hit ratio and reduces the data volume of the query data volume.
当获取到各个查询标识对应的查询结果后,拼接查询结果,即将从缓存中获取到的查询结果和从数据库中获取到的查询结果进行拼接,转换为统一的形态,得到目标数据,并将目标数据返回给调用方。After the query results corresponding to each query identifier are obtained, the query results are spliced, that is, the query results obtained from the cache and the query results obtained from the database are spliced, converted into a unified form, the target data is obtained, and the target Data is returned to the caller.
在本公开的实施例的方式中,若从查询入参中解析出分组维度和排序维度,分组维度指示了分组字段,排序维度指示了排序字段和排序取值范围,则拼接各个查询标识的查询结果,得到目标数据,包括:In the method of the embodiment of the present disclosure, if the grouping dimension and sorting dimension are parsed from the query input parameters, the grouping dimension indicates the grouping field, and the sorting dimension indicates the sorting field and sorting value range, then splicing the query of each query identifier As a result, target data is obtained, including:
根据排序字段,对所述各个查询标识的查询结果进行排序,并根据排序取值范围,获取与排序取值范围对应的查询结果,以得到目标数据。其中,排序取值范围为根据排序字段进行排序后的选择的范围。According to the sorting field, the query results of the respective query identifiers are sorted, and according to the sorting value range, the query results corresponding to the sorting value range are obtained, so as to obtain the target data. Wherein, the range of sorting values is the selection range after sorting is performed according to the sorting field.
例如,从查询入参解析出开始时间=2021-05-01 00:00:00,结束时间=2021-05-02 23:59:59,时间粒度为1天,分组字段为二级部门,排序字段为订单量,排序取值范围为订单量最多的前2个结果,则首先查询二级部门a分别在2021-05-01和2021-05-02每天的订单量,以及二级部门b分别在2021-05-01和2021-05-02每天的订单量,待返回值的个数为4个,可以将该获取请求拆解为4个请求,可以确定4个Key,若3个Key在缓存中,1个Key在数据库中,则从缓存中获取其中3个Key的查询结果,从数据库中获取剩余1个Key的查询结果,然后将从缓存和数据库中获取的查询结果转换为统一形态,并根据订单量的大小排序,并选取订单量最多的前2个查询结果,得到目标数据并返回。For example, starting time = 2021-05-01 00:00:00, ending time = 2021-05-02 23:59:59 from the query input parameters, the time granularity is 1 day, the grouping field is the second-level department, sorting The field is the order quantity, and the sorting value range is the first two results with the largest order quantity. First, query the order quantity of the second-level department a on 2021-05-01 and 2021-05-02 respectively, and the order quantity of the second-level department b respectively For the daily order volume on 2021-05-01 and 2021-05-02, the number of values to be returned is 4. The acquisition request can be disassembled into 4 requests, and 4 Keys can be determined. If 3 Keys are in In the cache, if one key is in the database, the query results of three of the keys are obtained from the cache, and the query results of the remaining one key are obtained from the database, and then the query results obtained from the cache and the database are converted into a unified form , and sort according to the size of the order volume, and select the top 2 query results with the largest order volume, get the target data and return it.
如果当前调用方的获取请求为查询某个时间粒度下的某维度的目标数据,则该时间粒度下的下个时刻的数据可能也会查询,如对于一些业务是对比过去时间的数据时,场景如:将实时数据和去年数据对比,实时数据如果是以1分钟为时间粒度,一天订单量可以是1440个点的数据,去年同期也是以1分钟为时间粒度的,去年的订单数据。If the current caller's acquisition request is to query the target data of a certain dimension under a certain time granularity, the data at the next moment under this time granularity may also be queried. For example, when some businesses compare data in the past time, the scenario For example: Comparing real-time data with last year’s data, if the real-time data is based on 1-minute time granularity, the order volume in a day can be 1440 points of data, and the same period last year is also based on 1-minute time granularity, last year’s order data.
在本公开的实施例中,步骤S102,进一步还包括:In an embodiment of the present disclosure, step S102 further includes:
从查询入参解析出时间属性,时间属性包括开始时间、结束时间和时间粒度;Parse the time attribute from the query input parameter, and the time attribute includes start time, end time and time granularity;
根据开始时间、结束时间和时间粒度,异步生成与查询入参对应的新时间查询标识;According to the start time, end time and time granularity, asynchronously generate a new time query identifier corresponding to the query input parameter;
其中,新时间查询标识的结束时间为查询入参的结束时间与时间粒度的加和。Wherein, the end time of the new time query identifier is the sum of the end time of the query input parameter and the time granularity.
当从查询入参中解析出开始时间、结束时间和时间粒度后,可以异步生成新时间查询标识,新时间查询标识为根据查询入参确定的各个查询标识中下一时刻的查询标识,作为预测的查询标识。新时间标识的结束时间可以通过查询入参的结束时间与时间粒度的加和计算得到,例如,查询平台a,部门b,在2020-01-01 00:00:00~2020-01-01 00:59:59之间的累计指标v_1,时间粒度为1小时,则新时间查询标识的结束时间为2020-01-01 01:59:59,即新时间查询标识查询的是2020-01-01 00:00:00~2020-01-01 01:59:59之间的累计指标v_1。After parsing the start time, end time, and time granularity from the query input parameters, a new time query ID can be generated asynchronously. The new time query ID is the query ID at the next moment in each query ID determined according to the query input parameters, as a prediction query ID. The end time of the new time stamp can be calculated by summing the end time of the query input parameters and the time granularity, for example, query platform a, department b, between 2020-01-01 00:00:00 ~ 2020-01-01 00 For the cumulative index v_1 between :59:59 and the time granularity is 1 hour, the end time of the new time query ID is 2020-01-01 01:59:59, that is, the query for the new time query ID is 2020-01-01 Cumulative indicator v_1 between 00:00:00 and 2020-01-01 01:59:59.
在本公开的实施例中,异步生成新时间查询标识后,根据新时间查询标识从数据库中获取新时间查询标识对应的查询结果,然后将新时间查询标识和新时间查询标识对应的查询结果,对应存储到缓存中,以更新缓存,以便下一时刻或后续获取该新时间查询标识的查询结果时,可以直接从缓存中获取,而不需要再从数据库中获取,从而提升了缓存命中率,提升了数据查询的效率,缩短了接口响应时间。In the embodiment of the present disclosure, after the new time query identifier is generated asynchronously, the query result corresponding to the new time query identifier is obtained from the database according to the new time query identifier, and then the new time query identifier and the query result corresponding to the new time query identifier are Correspondingly stored in the cache to update the cache, so that when the query result of the new time query identifier is obtained at the next moment or later, it can be obtained directly from the cache instead of from the database, thereby improving the cache hit rate. Improve the efficiency of data query and shorten the interface response time.
可选地,为避免短时间查询数据库的压力较大,可以将新时间查询标识异步到消息组件,如消息队列中,通过监听消息并且错开时间查询数据库,并存储新时间查询标识和对应的查询结果至缓存中,已更新缓存。Optionally, in order to avoid the high pressure of querying the database in a short time, the new time query ID can be asynchronously sent to a message component, such as a message queue, to query the database by listening to messages and staggering the time, and store the new time query ID and corresponding query The result is in the cache, and the cache has been updated.
例如,类型1,查询平台a,部门b,在2020-01-01  00:00:00~2020-01-01 00:59:59之间的累计指标v_1,则很可能下一个时刻就需要查平台a,部门b,在2020-01-01 00:00:00~2020-01-01 01:59:59之间的累计指标v_1,从而可以确定一个新时间Key。类型2,查询平台a,部门b,在2020-01-01 00:00:00~2020-01-01 11:59:59之间,0-1点的累计值,0-2点的累计值,0-3点的累计值……0-12点的累计值,则很可能下一个时刻就需要查0-13点的累计值,从而可以确定一个新时间Key。类型3,查询平台a,一级部门b下面所有的二级部门,在2020-01-01 00:00:00~2020-01-01 00:59:59之间的累计数据,则很可能下一个时刻就需要查平台a,一级部门b下面所有的二级部门,在2020-01-01 00:00:00~2020-01-01 01:59:59之间的累计数据,如二级部门有3个,待获取的返回值的个数为3个,可以确定3个Key,每个Key可以在生成一个新时间Key,则共生成3个新时间Key,将新时间Key和从数据库中获取的新时间Key的查询结果对应存储在缓存中,这样当下次调用方调用,就可以命中缓存,而不需要再从数据库中查询,提升了缓存的命中率。For example, type 1, query platform a, department b, the cumulative index v_1 between 2020-01-01 00:00:00 ~ 2020-01-01 00:59:59, it is likely to need to query at the next moment Platform a, department b, cumulative index v_1 between 2020-01-01 00:00:00 and 2020-01-01 01:59:59, so that a new time Key can be determined. Type 2, query platform a, department b, between 2020-01-01 00:00:00~2020-01-01 11:59:59, cumulative value of 0-1 point, cumulative value of 0-2 point , the accumulative value of 0-3 o'clock...the accumulative value of 0-12 o'clock, it is very likely that the accumulative value of 0-13 o'clock will need to be checked at the next moment, so that a new time Key can be determined. Type 3, query the cumulative data between 2020-01-01 00:00:00~2020-01-01 00:59:59 of platform a and all the second-level departments under the first-level department b, it is likely to download It is necessary to check the cumulative data between 2020-01-01 00:00:00 and 2020-01-01 01:59:59 of platform a and all the second-level departments under the first-level department b, such as the second-level There are 3 departments, the number of return values to be obtained is 3, and 3 Keys can be determined, and each Key can generate a new time Key, so a total of 3 new time Keys will be generated, and the new time Key and the data from the database The query result of the new time Key obtained in the corresponding storage in the cache, so that when the caller calls next time, the cache can be hit without querying from the database, which improves the cache hit rate.
再如,当前时间是2021.5.12,调用方查询0时刻-第100个点的数据之间的累计订单数据时,对比历史也会查询2020.5.12的0时刻-第100个点之间的累计订单数据,这时除了查询2020.5.12的0时刻-第100个点之间的累计订单数据以外,会异步生成2020.5.12的0时刻-第101个点的累计订单数据,存储到缓存中,这样下一时刻,如果调用方调用,就可以命中缓存,提升了缓存命中率。For another example, when the current time is 2021.5.12, when the caller queries the cumulative order data between the 0 time point and the 100th point data, the comparison history will also query the cumulative order data between the 0 time point and the 100th point data on 2020.5.12 Order data, at this time, in addition to querying the cumulative order data between the 0 moment of 2020.5.12 - the 100th point, the cumulative order data from the 0 moment of 2020.5.12 - the 101st point will be generated asynchronously and stored in the cache. In this way, at the next moment, if the caller calls, the cache can be hit, which improves the cache hit rate.
在本公开的实施例中,从数据库中获取查询标识对应的查询结果之后,还包括:In an embodiment of the present disclosure, after obtaining the query result corresponding to the query identifier from the database, it further includes:
判断查询结果中返回值的个数是否大于1;Determine whether the number of returned values in the query result is greater than 1;
若否,将查询结果和查询结果对应的查询标识对应存储到缓存中;If not, store the query result and the query identifier corresponding to the query result in the cache;
若是,根据查询结果中返回值的个数,对查询结果与查询结果对应的查询标识进行拆解,获得子查询标识和子查询标识对应的返回值,并将子查询标识和子查询标识对应的返回值对应存储在缓存中。If so, according to the number of returned values in the query result, disassemble the query result and the query identifier corresponding to the query result, obtain the subquery identifier and the return value corresponding to the subquery identifier, and obtain the subquery identifier and the return value corresponding to the subquery identifier The correspondence is stored in the cache.
当确定的各个查询标识中,有需要从数据库中获取查询结果的查询标识时,当从数据库中获取到查询结果后,判断查询结果中返回值的个数是否大于1,如果是,则说明查询的是一个值,则不进行拆解,直接将查询标识和对应的查询结果存储到缓存中;如果查询结果中返回值的个数大于1,则根据查询结果中返回值的个数,将查询结果和对应的查询标识拆解,得到子查询标识和子查询标识对应的返回值,子查询标识的个数与查询结果中返回值的个数相等,然后存储子查询标识和子查询标识对应的返回值至缓存中,以更新缓存,从而使得缓存中存储的数据的粒度较细,即将粗粒度的查询标识拆解为多个细粒度的子查询标识,从而提高缓存的命中率,缓存命中率越高,接口性能越好。缓存中存储细粒度的数据,当不同的调用方或查询不同的数据范围或维度组合时,也能够实现缓存的共享。When among the determined query identifiers, there is a query identifier that needs to obtain query results from the database, after obtaining the query results from the database, determine whether the number of returned values in the query results is greater than 1, and if so, indicate the query If the value is a value, the query identifier and the corresponding query result are directly stored in the cache without disassembly; if the number of returned values in the query result is greater than 1, the query The result and the corresponding query identifier are disassembled to obtain the subquery identifier and the return value corresponding to the subquery identifier. The number of subquery identifiers is equal to the number of return values in the query result, and then store the subquery identifier and the return value corresponding to the subquery identifier To the cache to update the cache, so that the granularity of the data stored in the cache is finer, that is, the coarse-grained query identifier is disassembled into multiple fine-grained sub-query identifiers, thereby improving the cache hit rate, and the higher the cache hit rate , the better the interface performance. Fine-grained data is stored in the cache, and when different callers or query different data ranges or combinations of dimensions, the cache can also be shared.
如图4所示为本公开的实施例的一种数据处理的方法的流程示意图,该方法包括:接收针对目标数据的获取请求,获取请求包括查询入参;对查询入参解析,获取时间属性和分组维度,根据时间属性异步生成新时间Key,并存储在消息队列中,从消息队列中获取新时间Key,并从数据库中查询新时间Key对应的Value,将新时间Key和对应的Value对应存储到缓存中;判断分组维度与已配置的分组维度是否匹配,若是,则根据时间属性和分组维度的枚举值确定待获取的返回值的个数,若否,则根据时间属性确定待获取的返回值的个数;判断待获取的返回值的个数是否大于1,若否,则拼接一个Key;若是,判断缓存中待获取的返回值的个数是否不小于预设阈值如果是,则根据待获取的返回值的个数,对查询入参进行拆解,拼接多个Key,如果否,则不进行拆解,拼接一个Key;判断Key是否在缓存中,如果在缓存中,则从缓存中获取对应的Value,如果不在缓存中,则从数据库中的获取对应的Value,将从缓存中获取的Value和数据库中获取的Value拼接,得到目标数据并返回;同时判断从数据库中获取的Value中的返回值的个数是否大于1,如果否,则将Value和对应的Key存储到缓存 中,如果是,则根据Value中返回值的个数,对Value和Key拆解后对应存储到缓存中。As shown in FIG. 4 , it is a schematic flow diagram of a data processing method according to an embodiment of the present disclosure. The method includes: receiving an acquisition request for target data, where the acquisition request includes query input parameters; analyzing the query input parameters to obtain time attributes and grouping dimension, generate a new time Key asynchronously according to the time attribute, and store it in the message queue, obtain the new time Key from the message queue, query the Value corresponding to the new time Key from the database, and map the new time Key to the corresponding Value Store in the cache; determine whether the grouping dimension matches the configured grouping dimension, if so, determine the number of return values to be obtained according to the time attribute and the enumeration value of the grouping dimension, if not, determine the number of return values to be obtained according to the time attribute The number of return values; determine whether the number of return values to be obtained is greater than 1, if not, splice a Key; if so, determine whether the number of return values to be obtained in the cache is not less than the preset threshold; if yes, According to the number of return values to be obtained, the query input parameters are disassembled, and multiple Keys are spliced. If not, a Key is spliced without disassembly; to determine whether the Key is in the cache, if it is in the cache, then Obtain the corresponding Value from the cache. If it is not in the cache, obtain the corresponding Value from the database. Combine the Value obtained from the cache with the Value obtained from the database to obtain the target data and return it; at the same time, it is judged to obtain it from the database. Whether the number of return values in the Value is greater than 1, if not, store the Value and the corresponding Key in the cache, if yes, according to the number of return values in the Value, disassemble the Value and Key and store them correspondingly into the cache.
本公开的实施例提供的数据处理的方法,通过对获取请求中的查询入参进行解析,确定与查询入参对应的各个查询标识,通过判断各个查询标识的存储位置,先后通过查询缓存和数据库的方式获取各个查询标识对应的查询结果。当查询入参可以进行拆解时,根据查询入参确定多个查询标识,当查询标识不在缓存中时,将从数据库中获取的查询结果和对应的查询标识拆解后存储到缓存中,从而使得缓存中存储的数据的粒度较小,便于后续进行数据查询时,缓存命中率高,并且,根据时间属性生成新时间的查询标识,并将新时间的查询标识和从数据库中获取的对应的查询结果存储到缓存中,提高后续数据查询的缓存命中率。本公开的实施例提供的数据处理的方法,能够提高缓存命中率,提升数据查询的效率,缩短接口响应时间,提高接口吞吐量,提高接口抗并发能力,提高接口服务性能,进而提升用户体验。In the data processing method provided by the embodiments of the present disclosure, each query identifier corresponding to the query input parameter is determined by analyzing the query input parameters in the acquisition request, and by determining the storage location of each query identifier, successively through the query cache and the database Obtain the query results corresponding to each query identifier in the same way. When the query input parameters can be disassembled, multiple query IDs are determined according to the query input parameters. When the query IDs are not in the cache, the query results obtained from the database and the corresponding query IDs are disassembled and stored in the cache, so that The granularity of the data stored in the cache is small, which is convenient for subsequent data query, and the cache hit rate is high, and the query ID of the new time is generated according to the time attribute, and the query ID of the new time is compared with the corresponding query ID obtained from the database. The query results are stored in the cache to improve the cache hit rate of subsequent data queries. The data processing method provided by the embodiments of the present disclosure can improve cache hit rate, improve data query efficiency, shorten interface response time, improve interface throughput, improve interface anti-concurrency capability, improve interface service performance, and further improve user experience.
如图5所示,本公开的实施例还提供一种数据处理的装置500,包括:As shown in FIG. 5, an embodiment of the present disclosure also provides a data processing device 500, including:
接收模块501,接收针对目标数据的获取请求,获取请求包括查询入参;The receiving module 501 receives an acquisition request for target data, where the acquisition request includes query parameters;
确定模块502,对查询入参进行解析,确定与查询入参对应的各个查询标识;Determining module 502, analyzing the query input parameters, and determining each query identifier corresponding to the query input parameters;
判断模块503,判断缓存中是否存在查询标识,若是,从缓存中获取查询标识对应的查询结果;若否,从数据库中获取查询标识对应的查询结果;Judging module 503, judging whether there is a query identifier in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
拼接模块504,拼接各个查询标识的查询结果,得到目标数据。The splicing module 504 splices the query results of each query identifier to obtain the target data.
在本公开的实施例的一种实施方式中,确定模块502,进一步用于:从所述查询入参中解析出时间属性;根据所述时间属性,确定所述查询入参对应的待获取的返回值的个数;根据所述待获取的返回值的个 数,确定与所述查询入参对应的各个查询标识。In an implementation manner of the embodiments of the present disclosure, the determination module 502 is further configured to: parse out the time attribute from the query input parameter; The number of return values; according to the number of return values to be obtained, each query identifier corresponding to the query input parameters is determined.
在本公开的实施例的一种实施方式中,确定模块502,进一步用于:从查询入参中解析出时间属性和维度类型;维度类型包括分组维度;In an implementation manner of the embodiments of the present disclosure, the determination module 502 is further configured to: parse out the time attribute and dimension type from the query input parameter; the dimension type includes a grouping dimension;
根据时间属性和分组维度,确定查询入参对应的待获取的返回值的个数;According to the time attribute and grouping dimension, determine the number of return values to be obtained corresponding to the query input parameters;
根据待获取的返回值的个数,确定与查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
在本公开的实施例的另一种实施方式中,确定模块502,进一步用于:从查询入参中解析出时间属性和维度类型;维度类型包括限制维度,限制维度指示了枚举值;In another embodiment of the present disclosure, the determining module 502 is further configured to: parse out the time attribute and dimension type from the query input parameter; the dimension type includes a restricted dimension, and the restricted dimension indicates an enumerated value;
根据时间属性和枚举值,确定查询入参对应的待获取的返回值的个数;Determine the number of return values to be obtained corresponding to the query input parameters according to the time attribute and the enumeration value;
根据待获取的返回值的个数,确定与查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
在本公开的实施例中,确定模块502,进一步用于:判断所述分组维度与已配置的分组维度是否匹配;若是,根据所述时间属性和所述分组维度的枚举值确定所述待获取的返回值的个数;若否,根据所述时间属性确定所述待获取的返回值的个数。In an embodiment of the present disclosure, the determining module 502 is further configured to: judge whether the grouping dimension matches the configured grouping dimension; if so, determine the pending The number of return values obtained; if not, determine the number of return values to be obtained according to the time attribute.
在本公开的实施例中,确定模块502,进一步用于:判断待获取的返回值的个数是否大于1;若否,确定与查询入参对应的一个查询标识;若是,判断缓存中所述待获取的返回值的个数是否不小于预设阈值,若是,对查询入参进行拆解,确定与查询入参对应的各个查询标识,若否,不进行拆解,确定与查询入参对应的一个查询标识。In the embodiment of the present disclosure, the determination module 502 is further used to: determine whether the number of return values to be obtained is greater than 1; if not, determine a query identifier corresponding to the query input parameter; if so, determine the Whether the number of return values to be obtained is not less than the preset threshold, if yes, disassemble the query input parameters, and determine the query identifiers corresponding to the query input parameters, if not, do not disassemble, and determine the corresponding query input parameters A query ID for .
在本公开的实施例中,确定模块503,还用于:从所述查询入参解析出时间属性,所述时间属性包括开始时间、结束时间和时间粒度, 根据开始时间、结束时间和时间粒度,异步生成与查询入参对应的新时间查询标识;其中,新时间查询标识的结束时间为查询入参的结束时间与时间粒度的加和。In the embodiment of the present disclosure, the determination module 503 is further configured to: parse out the time attribute from the query input parameter, the time attribute includes start time, end time and time granularity, according to the start time, end time and time granularity , asynchronously generate a new time query identifier corresponding to the query input parameter; wherein, the end time of the new time query identifier is the sum of the end time of the query input parameter and the time granularity.
在本公开的实施例中,维度类型还包括排序维度,排序维度指示了排序字段和排序取值范围;拼接模块504,还用于:根据排序字段,对各个查询标识的查询结果进行排序,并根据排序取值范围,获取与排序取值范围对应的查询结果,以得到目标数据。In the embodiment of the present disclosure, the dimension type also includes a sorting dimension, and the sorting dimension indicates a sorting field and a sorting value range; the splicing module 504 is also configured to: sort the query results of each query identifier according to the sorting field, and According to the sorting value range, the query result corresponding to the sorting value range is obtained to obtain the target data.
在本公开的实施例中,该装置还包括存储模块,用于从数据库中获取所述查询标识对应的查询结果之后,还包括:判断所述查询结果中返回值的个数是否大于1;若否,将所述查询结果和所述查询结果对应的查询标识对应存储在缓存中;若是,根据所述查询结果中返回值的个数,对所述查询结果与所述查询结果对应的查询标识进行拆解,获得子查询标识和所述子查询标识对应的返回值,并将所述子查询标识和所述子查询标识对应的返回值对应存储在所述缓存中。In an embodiment of the present disclosure, the device further includes a storage module, configured to, after obtaining the query result corresponding to the query identifier from the database, further include: judging whether the number of returned values in the query result is greater than 1; if No, correspondingly store the query result and the query identifier corresponding to the query result in the cache; if yes, store the query result and the query identifier corresponding to the query result according to the number of returned values in the query result Perform disassembly to obtain the sub-query identifier and the return value corresponding to the sub-query identifier, and correspondingly store the sub-query identifier and the return value corresponding to the sub-query identifier in the cache.
本公开的实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本公开的实施例提供的数据处理的方法。Embodiments of the present disclosure also provide an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that One or more processors implement the data processing method provided by the embodiments of the present disclosure.
本公开的实施例还提供了一种计算机可读介质,其上存储有计算机程序,程序被处理器执行时实现本公开的实施例提供的数据处理的方法。Embodiments of the present disclosure also provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the data processing method provided by the embodiments of the present disclosure is implemented.
图6示出了可以应用本公开的实施例的数据处理的方法或数据处理的装置的示例性系统架构600。FIG. 6 shows an exemplary system architecture 600 to which the data processing method or data processing device of the embodiments of the present disclosure can be applied.
如图6所示,系统架构600可以包括终端设备601、602、603,网 络604和服务器605。网络604用以在终端设备601、602、603和服务器605之间提供通信链路的介质。网络604可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 6 , a system architecture 600 may include terminal devices 601 , 602 , and 603 , a network 604 and a server 605 . The network 604 is used as a medium for providing communication links between the terminal devices 601 , 602 , 603 and the server 605 . Network 604 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备601、602、603通过网络604与服务器605交互,以接收或发送消息等。终端设备601、602、603上可以安装有各种通讯客户端应用,例如搜索类应用、购物类应用、网页浏览器应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。Users can use terminal devices 601 , 602 , 603 to interact with server 605 via network 604 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 601, 602, 603, such as search applications, shopping applications, web browser applications, instant messaging tools, email clients, social platform software, etc. (just for example).
终端设备601、602、603可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 601, 602, 603 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.
服务器605可以是提供各种服务的服务器,例如对用户利用终端设备601、602、603所使用的搜索类应用提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到目标数据获取请求等数据进行分析等处理,并将处理结果(目标数据--仅为示例)反馈给终端设备。The server 605 may be a server that provides various services, such as a background management server that provides support for search applications used by users using the terminal devices 601 , 602 , 603 (just an example). The background management server can analyze and process the received data such as the target data acquisition request, and feed back the processing result (target data—just an example) to the terminal device.
需要说明的是,本公开的实施例所提供的数据处理方法一般由服务器605执行,相应地,数据处理的装置一般设置于服务器605中。It should be noted that the data processing method provided by the embodiments of the present disclosure is generally executed by the server 605 , and correspondingly, the data processing device is generally disposed in the server 605 .
应该理解,图6中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 6 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
下面参考图7,其示出了适于用来实现本公开的实施例的终端设备的计算机系统700的结构示意图。图7示出的终端设备仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。Referring now to FIG. 7 , it shows a schematic structural diagram of a computer system 700 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
如图7所示,计算机系统700包括中央处理单元(CPU)701,其 可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , a computer system 700 includes a central processing unit (CPU) 701 that can operate according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random-access memory (RAM) 703 Instead, various appropriate actions and processes are performed. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to the bus 704 .
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 708 including a hard disk, etc. and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本公开的系统中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 709 and/or installed from removable media 711 . When this computer program is executed by a central processing unit (CPU) 701, the above-described functions defined in the system of the present disclosure are performed.
需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、 光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
描述于本公开的实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括接收模块、确定模块、判断模块和拼接模块。其中,这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,接收模块还可以被描述为“接收针对目标 数据的获取请求的模块”。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. The described modules may also be set in a processor, for example, it may be described as: a processor includes a receiving module, a determining module, a judging module and a splicing module. Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances, for example, the receiving module can also be described as "a module that receives an acquisition request for target data".
作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:接收针对目标数据的获取请求,获取请求包括查询入参;对查询入参进行解析,确定与查询入参对应的各个查询标识;判断缓存中是否存在查询标识,若是,从缓存中获取查询标识对应的查询结果;若否,从数据库中获取查询标识对应的查询结果;拼接各个查询标识的查询结果,得到目标数据。As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments, or may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device includes: receiving an acquisition request for target data, where the acquisition request includes query input parameters; Analyze the parameters to determine each query identifier corresponding to the query input parameter; determine whether there is a query identifier in the cache, if so, obtain the query result corresponding to the query identifier from the cache; if not, obtain the query result corresponding to the query identifier from the database; The query results of each query identifier are spliced to obtain the target data.
根据本公开的实施例的技术方案,通过对获取请求中的查询入参进行解析,确定与查询入参对应的各个查询标识,通过判断各个查询标识的存储位置,先后通过查询缓存和数据库的方式获取各个查询标识对应的查询结果。当查询入参可以进行拆解时,根据查询入参确定多个查询标识,当查询标识不在缓存中时,将从数据库中获取的查询结果和对应的查询标识拆解后存储到缓存中,从而使得缓存中存储的数据的粒度较小,便于后续进行数据查询时,缓存命中率高,并且,根据时间属性生成新时间的查询标识,并将新时间的查询标识和从数据库中获取的对应的查询结果存储到缓存中,提高后续数据查询的缓存命中率。本公开的实施例提供的数据处理的方法,能够提高缓存命中率,提升数据查询的效率,缩短接口响应时间,提高接口吞吐量,提高接口抗并发能力,提高接口服务性能,进而提升用户体验。According to the technical solutions of the embodiments of the present disclosure, by analyzing the query input parameters in the acquisition request, each query identifier corresponding to the query input parameter is determined, and by judging the storage location of each query identifier, successively through the query cache and the database Obtain the query result corresponding to each query identifier. When the query input parameters can be disassembled, multiple query IDs are determined according to the query input parameters. When the query IDs are not in the cache, the query results obtained from the database and the corresponding query IDs are disassembled and stored in the cache, so that The granularity of the data stored in the cache is small, which is convenient for subsequent data query, and the cache hit rate is high, and the query ID of the new time is generated according to the time attribute, and the query ID of the new time is compared with the corresponding query ID obtained from the database. The query results are stored in the cache to improve the cache hit rate of subsequent data queries. The data processing method provided by the embodiments of the present disclosure can improve cache hit rate, improve data query efficiency, shorten interface response time, improve interface throughput, improve interface anti-concurrency capability, improve interface service performance, and further improve user experience.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (12)

  1. 一种数据处理的方法,其特征在于,包括:A data processing method, characterized in that, comprising:
    接收针对目标数据的获取请求,所述获取请求包括查询入参;receiving an acquisition request for target data, where the acquisition request includes query input parameters;
    对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识;Analyzing the query input parameters to determine each query identifier corresponding to the query input parameters;
    判断缓存中是否存在所述查询标识,若是,从所述缓存中获取所述查询标识对应的查询结果;若否,从数据库中获取所述查询标识对应的查询结果;Judging whether the query identifier exists in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
    拼接所述各个查询标识的查询结果,得到所述目标数据。The query results of the respective query identifiers are spliced to obtain the target data.
  2. 根据权利要求1所述的方法,其特征在于,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:The method according to claim 1, wherein parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
    从所述查询入参中解析出时间属性;Parsing out the time attribute from the query input parameter;
    根据所述时间属性,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute, determine the number of return values to be acquired corresponding to the query input parameters;
    根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
  3. 根据权利要求1所述的方法,其特征在于,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:The method according to claim 1, wherein parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
    从所述查询入参中解析出时间属性和维度类型;所述维度类型包括分组维度;Analyzing time attributes and dimension types from the query input parameters; the dimension types include grouping dimensions;
    根据所述时间属性和所述分组维度,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute and the grouping dimension, determine the number of return values to be acquired corresponding to the query input parameter;
    根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
  4. 根据权利要求3所述的方法,其特征在于,所述维度类型还包括排序维度,所述排序维度指示了排序字段和排序取值范围;The method according to claim 3, wherein the dimension type further includes a sorting dimension, and the sorting dimension indicates a sorting field and a sorting value range;
    拼接所述各个查询标识的查询结果,得到所述目标数据,包括:Splicing the query results of the various query identifiers to obtain the target data includes:
    根据所述排序字段,对所述各个查询标识的查询结果进行排序,并根据所述排序取值范围,获取与所述排序取值范围对应的查询结果,以得到所述目标数据。sorting the query results of the respective query identifiers according to the sorting field, and obtaining the query results corresponding to the sorting value range according to the sorting value range, so as to obtain the target data.
  5. 根据权利要求3所述的方法,其特征在于,根据所述时间属性和所述分组维度,确定所述查询入参对应的待获取的返回值的个数,包括:The method according to claim 3, wherein, according to the time attribute and the grouping dimension, determining the number of return values to be obtained corresponding to the query input parameters includes:
    判断所述分组维度与已配置的分组维度是否匹配;judging whether the grouping dimension matches the configured grouping dimension;
    若是,根据所述时间属性和所述分组维度的枚举值确定所述待获取的返回值的个数;若否,根据所述时间属性确定所述待获取的返回值的个数。If yes, determine the number of return values to be acquired according to the time attribute and the enumeration value of the grouping dimension; if not, determine the number of return values to be acquired according to the time attribute.
  6. 根据权利要求1所述的方法,其特征在于,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,包括:The method according to claim 1, wherein parsing the query input parameters to determine each query identifier corresponding to the query input parameters includes:
    从所述查询入参中解析出时间属性和维度类型;所述维度类型包括限制维度,所述限制维度指示了枚举值;Parse the time attribute and dimension type from the query input parameter; the dimension type includes a restricted dimension, and the restricted dimension indicates an enumerated value;
    根据所述时间属性和所述枚举值,确定所述查询入参对应的待获取的返回值的个数;According to the time attribute and the enumerated value, determine the number of return values to be obtained corresponding to the query input parameter;
    根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识。According to the number of return values to be obtained, each query identifier corresponding to the query input parameter is determined.
  7. 根据权利要求2-6任一项所述的方法,其特征在于,根据所述待获取的返回值的个数,确定与所述查询入参对应的各个查询标识,包括:The method according to any one of claims 2-6, characterized in that, according to the number of return values to be obtained, determining each query identifier corresponding to the query input parameters, including:
    判断所述待获取的返回值的个数是否大于1;judging whether the number of return values to be obtained is greater than 1;
    若否,确定与所述查询入参对应的一个查询标识;If not, determine a query identifier corresponding to the query input parameter;
    若是,判断所述缓存中所述待获取的返回值的个数是否不小于预设阈值,若是,对所述查询入参进行拆解,确定与所述查询入参对应的各个查询标识,若否,不进行拆解,确定与所述查询入参对应的一 个查询标识。If yes, determine whether the number of return values to be obtained in the cache is not less than a preset threshold, if yes, disassemble the query input parameters, and determine each query identifier corresponding to the query input parameters, if No, disassembly is not performed, and a query identifier corresponding to the query input parameter is determined.
  8. 根据权利要求1所述的方法,其特征在于,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识,还包括:The method according to claim 1, wherein parsing the query input parameters to determine each query identifier corresponding to the query input parameters further includes:
    从所述查询入参解析出时间属性,所述时间属性包括开始时间、结束时间和时间粒度;Analyzing time attributes from the query input parameters, the time attributes include start time, end time and time granularity;
    根据所述开始时间、结束时间和时间粒度,异步生成与所述查询入参对应的新时间查询标识;Asynchronously generate a new time query identifier corresponding to the query input parameter according to the start time, end time and time granularity;
    其中,所述新时间查询标识的结束时间为所述查询入参的结束时间与所述时间粒度的加和。Wherein, the end time of the new time query identifier is the sum of the end time of the query input parameter and the time granularity.
  9. 根据权利要求1所述的方法,其特征在于,从数据库中获取所述查询标识对应的查询结果之后,还包括:The method according to claim 1, wherein after obtaining the query result corresponding to the query identifier from the database, further comprising:
    判断所述查询结果中返回值的个数是否大于1;Judging whether the number of returned values in the query result is greater than 1;
    若否,将所述查询结果和所述查询结果对应的查询标识对应存储在缓存中;If not, correspondingly storing the query result and the query identifier corresponding to the query result in the cache;
    若是,根据所述查询结果中返回值的个数,对所述查询结果与所述查询结果对应的查询标识进行拆解,获得子查询标识和所述子查询标识对应的返回值,并将所述子查询标识和所述子查询标识对应的返回值对应存储在所述缓存中。If so, according to the number of returned values in the query result, the query result and the query identifier corresponding to the query result are disassembled, the subquery identifier and the return value corresponding to the subquery identifier are obtained, and the The sub-query identifier and the return value corresponding to the sub-query identifier are correspondingly stored in the cache.
  10. 一种数据处理的装置,其特征在于,包括:A data processing device, characterized in that it comprises:
    接收模块,接收针对目标数据的获取请求,所述获取请求包括查询入参;The receiving module receives an acquisition request for target data, and the acquisition request includes query input parameters;
    确定模块,对所述查询入参进行解析,确定与所述查询入参对应的各个查询标识;The determining module is configured to analyze the query input parameters, and determine each query identifier corresponding to the query input parameters;
    判断模块,判断缓存中是否存在所述查询标识,若是,从所述缓存中获取所述查询标识对应的查询结果;若否,从数据库中获取所述查询标识对应的查询结果;A judging module, judging whether the query identifier exists in the cache, if so, obtaining the query result corresponding to the query identifier from the cache; if not, obtaining the query result corresponding to the query identifier from the database;
    拼接模块,拼接所述各个查询标识的查询结果,得到所述目标数 据。A splicing module splices the query results of the respective query identifiers to obtain the target data.
  11. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-9.
  12. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1-9中任一所述的方法。A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-9 is realized.
PCT/CN2022/093272 2021-08-30 2022-05-17 Data processing method and apparatus WO2023029592A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111005262.3 2021-08-30
CN202111005262.3A CN113641713A (en) 2021-08-30 2021-08-30 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2023029592A1 true WO2023029592A1 (en) 2023-03-09

Family

ID=78424351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/093272 WO2023029592A1 (en) 2021-08-30 2022-05-17 Data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN113641713A (en)
WO (1) WO2023029592A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641713A (en) * 2021-08-30 2021-11-12 北京沃东天骏信息技术有限公司 Data processing method and device
CN115277128B (en) * 2022-07-13 2024-02-23 上海砾阳软件有限公司 Illegal request processing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512222A (en) * 2015-11-30 2016-04-20 中国建设银行股份有限公司 Data query method and system, and data reading method and system
CN110019350A (en) * 2017-07-28 2019-07-16 北京京东尚科信息技术有限公司 Data query method and apparatus based on configuration information
US20210089530A1 (en) * 2019-09-20 2021-03-25 Thoughtspot, Inc. Machine Language Query Management for Low-Latency Database Analysis System
CN113254480A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Data query method and device
CN113641713A (en) * 2021-08-30 2021-11-12 北京沃东天骏信息技术有限公司 Data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512222A (en) * 2015-11-30 2016-04-20 中国建设银行股份有限公司 Data query method and system, and data reading method and system
CN110019350A (en) * 2017-07-28 2019-07-16 北京京东尚科信息技术有限公司 Data query method and apparatus based on configuration information
US20210089530A1 (en) * 2019-09-20 2021-03-25 Thoughtspot, Inc. Machine Language Query Management for Low-Latency Database Analysis System
CN113254480A (en) * 2020-02-13 2021-08-13 中国移动通信集团广东有限公司 Data query method and device
CN113641713A (en) * 2021-08-30 2021-11-12 北京沃东天骏信息技术有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN113641713A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
US11151206B2 (en) Method and apparatus for pushing information
WO2023029592A1 (en) Data processing method and apparatus
US11586585B2 (en) Method and system for historical call lookup in distributed file systems
CN108984553B (en) Caching method and device
CN106156088B (en) Index data processing method, data query method and device
US11645179B2 (en) Method and apparatus of monitoring interface performance of distributed application, device and storage medium
CN110909022A (en) Data query method and device
US20230029526A1 (en) System and method for dynamic data filtering
CN116611411A (en) Business system report generation method, device, equipment and storage medium
US20240054110A1 (en) Method, apparatus and electronic device for creating quantum vehicle model parts basic database, and storage medium
US20230153357A1 (en) Method of processing an observation information, electronic device and storage medium
CN111459980A (en) Monitoring data storage and query method and device
CN112131257B (en) Data query method and device
CN113377808A (en) SQL optimization method and device
CN113704222A (en) Method and device for processing service request
CN112699116A (en) Data processing method and system
CN113535768A (en) Production monitoring method and device
US11842077B2 (en) Method, device, and computer program product for transmitting data for object storage
CN114416414B (en) Fault information positioning method, device, equipment and storage medium
CN113778909B (en) Method and device for caching data
CN113760965B (en) Data query method and device
CN116361112B (en) Alarm convergence method and device
US11947540B1 (en) Query language for metric data
CN118296032A (en) Data query method and device
CN116303566A (en) Data query method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862734

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE