CN115114319A - Method, device and equipment for querying data based on data wide table - Google Patents

Method, device and equipment for querying data based on data wide table Download PDF

Info

Publication number
CN115114319A
CN115114319A CN202210679262.XA CN202210679262A CN115114319A CN 115114319 A CN115114319 A CN 115114319A CN 202210679262 A CN202210679262 A CN 202210679262A CN 115114319 A CN115114319 A CN 115114319A
Authority
CN
China
Prior art keywords
data
field
tables
wide table
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210679262.XA
Other languages
Chinese (zh)
Inventor
刘鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shareit Information Technology Co Ltd
Original Assignee
Beijing Shareit Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shareit Information Technology Co Ltd filed Critical Beijing Shareit Information Technology Co Ltd
Priority to CN202210679262.XA priority Critical patent/CN115114319A/en
Publication of CN115114319A publication Critical patent/CN115114319A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a method, a device, equipment and a storage medium for querying data based on a data wide table, which can be applied to the technical field of data processing. The method for querying data based on the data wide table comprises the following steps: receiving a query instruction of a user; and inquiring in the data wide table according to the field in the inquiry instruction to obtain inquiry data and output the inquiry data. The data wide table is obtained by summarizing data in a plurality of business data single tables, so that the required business data information can be inquired in real time by inquiring based on the data wide table, and the inquiry efficiency of the data information is improved.

Description

Method, device and equipment for querying data based on data wide table
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for querying data based on a data wide table.
Background
With the rapid development of the internet, electronic commerce supported by the internet technology also enters the business high-speed development period. When these services are implemented, a plurality of tables for recording service data information are generated, and the service information is recorded and stored.
Meanwhile, as the amount of traffic increases, more and more service tables need to be stored, which results in more and more complex storage design for the service tables. However, the complicated data storage method may cause that when a user queries data in the background, the user needs to search multiple databases and multiple data tables one by one, so that the response time of the whole process of data query is too long.
Therefore, a suitable data query method does not exist in the prior art.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a device and a storage medium for data query based on a data wide table, so as to realize real-time query of required service data information and improve query efficiency of data information.
In a first aspect, the present disclosure provides a method for performing data query based on a data wide table, including: receiving a query instruction of a user; inquiring in a data wide table according to fields in the inquiry instruction to obtain inquiry data, wherein the data wide table is obtained by summarizing data in a plurality of business data single tables; and outputting the query data.
In some possible embodiments, performing a query on the data wide table according to a field in the query instruction to obtain query data includes: according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table; links to query data according to the index.
In some possible embodiments, before receiving the query instruction of the user, the method further includes: acquiring a plurality of service data list tables; and summarizing data in the plurality of business data list tables to obtain a data wide table.
In some possible embodiments, the obtaining the data wide table by aggregating data in a plurality of business data single tables includes: determining a first field according to the service information in the plurality of service data list tables, wherein the first field is a field used for displaying in the data width table; and summarizing the data in the plurality of business data list tables according to the first field to obtain a data wide table.
In some possible embodiments, aggregating data in the plurality of service data list tables according to the first field to obtain the data wide table includes: analyzing a second field in each business data form in the multiple business data form tables, wherein the second field is used for indicating the incidence relation among data in the multiple business data form tables; and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain the data wide table.
In some possible embodiments, summarizing associated data in a plurality of service data list tables by using a second field according to a data range corresponding to a first field to obtain a data wide table, including: processing data in the plurality of business data single tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data single tables; acquiring associated data in the intermediate data by using the second field; and summarizing the associated data to obtain a data wide table.
In some possible embodiments, data change information in a plurality of business data sheet tables is detected; and when the change of the data in any single table is detected, updating the data associated with any single table in the data wide table.
In a second aspect, the present disclosure provides an apparatus for querying data based on a data wide table, where the apparatus may be a chip or a system on a chip in a terminal device, and may also be a functional module in the terminal device for implementing the first aspect and any one of its possible implementations. The data query apparatus may implement the functions executed by the terminal in the first aspect and any one of the possible implementations thereof, and the functions may be implemented by executing corresponding software through hardware. The hardware or software includes one or more modules corresponding to the above functions. The data query device comprises: the acquisition module is used for receiving a query instruction of a user; the processing module is used for inquiring in the data wide table according to the field in the inquiry instruction to obtain inquiry data, and the data wide table is obtained by summarizing data in a plurality of business data single tables; and the output module is used for outputting the query data.
In some possible embodiments, the processing module is further configured to: according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table; links to query data according to the index.
In some possible embodiments, the obtaining module is further configured to: before receiving a query instruction of a user, acquiring a plurality of service data list tables; and summarizing the data in the plurality of business data list tables to obtain a data wide table.
In some possible embodiments, the obtaining module is further configured to: determining a first field according to the service information in the plurality of service data list tables, wherein the first field is a field used for inquiring in the data wide table; and summarizing data in the plurality of business data sheet tables according to the first field to obtain a data wide table.
In some possible embodiments, the obtaining module is further configured to: analyzing a second field in each business data form in the multiple business data form tables, wherein the second field is used for indicating the incidence relation among data in the multiple business data form tables; and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain the data wide table.
In some possible embodiments, the obtaining module is further configured to: processing data in the plurality of business data list tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data list tables, wherein the intermediate data is data corresponding to the field of the data wide table; acquiring associated data in the intermediate data by using the second field; and summarizing the associated data to obtain a data wide table.
In some possible embodiments, the obtaining module is further configured to: after the associated data in the service data list tables are summarized by using the second field to obtain the data width table, determining the index of the associated data in the data width table according to the data volume of the associated data.
In some possible embodiments, the obtaining module is further configured to: detecting data change information in a plurality of service data list tables; and when detecting that the data in any single table is changed, updating the data information associated with any single table in the data wide table.
In a third aspect, the present disclosure provides a terminal, including: a memory for storing processor-executable instructions; a processor; wherein the processor is configured to: for executing executable instructions to implement a method as described in the first aspect and any possible implementation thereof.
In a fourth aspect, the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, are capable of implementing the method as described in the first aspect and any one of its possible embodiments.
Compared with the prior art, the technical scheme provided by the disclosure has the following beneficial effects:
in the disclosure, by receiving a query instruction of a user, querying is performed on a data wide table according to a field in the query instruction, and query data is obtained and output. The data wide table is obtained by summarizing data in a plurality of business data single tables, so that the required business data information can be inquired in real time by inquiring based on the data wide table, and the inquiry efficiency of the data information is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart illustrating an implementation of a method for querying data in an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating another implementation of the method for querying data in the embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a data query device in an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device in the embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In order to explain the technical solution of the present disclosure, the following description is given by way of specific examples.
With the rapid development of the internet, electronic commerce supported by the internet technology also enters the business high-speed development period. Common electronic commerce includes services such as online shopping of consumers, online transactions between merchants, and online electronic payment. When these services are implemented, a plurality of tables for recording service data information are generated, and the recording and storage of the service information are realized. Taking the online shopping service of the consumer as an example, when the service is realized, a transaction table representing transaction information, a payment table representing payment information, a channel table representing a service channel and the like are generated. The data recorded in the plurality of tables (which may be written as order tables) recording the service data information are collected to reflect all the information of the online shopping service. When the data amount of the order table is small, all the order tables are usually stored in a relational database, and a relational query is performed by directly using a Structured Query Language (SQL) in a manner of querying the database, so as to obtain all the information of the business.
With the increasing of the traffic volume, more and more service tables need to be stored, so that the storage design for the service tables is more and more complicated. For example, to avoid that the query time is too long due to the fact that mass data are stored in a single database, a database-based and table-based mode is often adopted to store the service table. However, the complicated storage design (such as library and table division) causes a problem that the related query cannot be directly carried out by using SQL. Meanwhile, the existing service query has high concurrency, and the relational database management system MySQL cannot bear the query request with high concurrency, so that the response time of the query result is too long and the request is overtime.
Therefore, there is no suitable data query method.
In order to solve the above problem, an embodiment of the present disclosure provides a method for performing data query based on a data wide table, so as to achieve real-time query of required service data information through the data wide table, and improve query efficiency of the data information.
Fig. 1 is a schematic implementation flow diagram of a data query method in an embodiment of the present disclosure, and referring to fig. 1, the data query method may include S101 to S103.
S101, receiving a query instruction of a user.
It should be understood that, when the terminal device executes S101, an instruction for instructing to query the data wide table to store information in real time may be received.
The query instruction may be an instruction obtained by the terminal device when the user operates the terminal device. The query instruction usually includes various fields such as words and symbols, which are used to indicate the queried information and can be recognized by the terminal device. Illustratively, the query instruction may include a field "2022.05" that the terminal device can recognize as indicating that the query is for order information related to 2022.05.
And S102, inquiring in the data wide table according to the field in the inquiry instruction to obtain inquiry data.
It should be understood that the terminal device may obtain the query instruction of the user after executing S101. Then, the terminal device executes S102, may identify a field in the query instruction, and perform data query in the data width table according to the indication of the field to obtain query data.
The data wide table is obtained by summarizing data in a plurality of business data sheet tables. The data-wide table may be a database table comprising a plurality of fields, each field corresponding to at least one data value. The data values may come from multiple service data list tables, which are stored in the same database, or may be stored in different databases. The data wide table can associate indexes, dimensions and attributes related to the business main body together through summarizing data values.
In some possible embodiments, S102 may include: according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table; links to query data according to the index.
It should be understood that, the terminal device executes S102, may identify a field in the query instruction, search for a corresponding index according to an indication of the field, search for a data value stored at a specified position in the data width table according to the index, and finally obtain query data.
Where the index may provide a pointer to a data value stored at a specified location in the data width table. The index can be used to find a specific value corresponding to a field in the instruction, and then the pointer can be used to find a storage position containing the value, so that the data value in the data wide table can be quickly accessed.
That is to say, the terminal device can change the original default inquiry mode of full-table scanning into the mode of positioning the storage position of a specific value in the de-index list once by inquiring the corresponding index in the data wide table and linking the index to the inquired data, thereby greatly reducing the workload of scanning the data and obviously increasing the inquiry speed.
And S103, outputting the query data.
It should be understood that the terminal device may obtain the query data after performing S102. Then, the terminal device executes S103, and may output the query data obtained in S102.
The manner of outputting the query data may be set according to actual requirements, which is not specifically limited by the present disclosure. For example, when a user needs to see query data on a page where a query instruction is input, the form of outputting the query data may be set to display the query data in the page; or, when the user needs to store and save the query data, the form of outputting the query data can be set to be the output document storage query data.
In the above embodiment, the terminal device executes S101 to S103, and may perform query in the data wide table according to a field in the query instruction by receiving the query instruction of the user, so as to obtain query data and output the query data. The data wide table is obtained by summarizing data in a plurality of business data single tables, so that the required business data information can be inquired in real time by inquiring based on the data wide table, and the inquiry efficiency of the data information is improved.
In some possible implementations, fig. 2 is a schematic implementation flow diagram of a method for querying data in the embodiment of the present disclosure, referring to fig. 2, S101 may further include S201 before.
S201, acquiring a plurality of business data list tables, and acquiring a data width table by summarizing data in the business data list tables.
It is to be understood that the terminal device may perform S201 before performing S101. The terminal equipment can obtain the data sheet table generated by a plurality of services, then summarize the data in the plurality of service data tables, and obtain the data width table according to the summarized data.
The manner of obtaining the data list table generated by the multiple services may be set according to actual requirements, which is not specifically limited by the present disclosure. For example, the terminal device obtains the data list table by accessing the corresponding database; or the terminal equipment acquires the data list table in a mode of monitoring the flow process at the service place. Meanwhile, when data in a plurality of business data list tables are summarized, the situation that the plurality of business data list tables may have database division and table division needs to be fully considered. And data information can be acquired in a mode of crossing databases.
In some possible embodiments, S201 may further include: determining a first field according to the service information in the plurality of service data list tables, wherein the first field is a field used for inquiring in the data wide table; and summarizing the data in the plurality of business data list tables according to the first field to obtain a data wide table.
It should be understood that, after acquiring the data list table generated by the multiple services, the terminal device may acquire the service information in the multiple service data list tables. And the terminal equipment accepts or rejects the content of the service information and determines the first field. And summarizing the data in the plurality of business data list tables according to the first field to obtain a data wide table.
Wherein, the first field is a field for displaying in the data width table. Specifically, after the terminal device obtains the data information stored in the data width table according to the query instruction, the terminal device outputs the data information. To output the regularity of the data, the data information may be presented according to the first field classification. Therefore, when summarizing the data sheet table generated by a plurality of services, the terminal device may determine the first field in advance, and summarize the data information related to the first field to obtain the data wide table. That is to say, the terminal device only collects the data information related to the first field, thus avoiding the collection of useless information and greatly saving the storage space.
Illustratively, the core service of company a is a payment service, and each service domain in the payment link contains a plurality of order tables. Such as a transaction table, payment table, channel table, etc. Due to the fact that the data volume of the order table is large, when the order table is stored by company A, the order table of each business domain is designed to be stored in four databases respectively, and database examples of each database are different. Further, in order to reduce the number of records of a single table, so as to reduce the time required for querying the single table and improve the throughput of the database, the number of sub-tables of each database is designed to be 64, so that the data is uniformly distributed into a plurality of tables, and the query is not influenced. Company A can provide a page for background order inquiry of a merchant, the merchant inquires data information in a data wide table on the page, and the page displays the data information according to the classification corresponding to the first field. Wherein the first fields (transaction amount, payment status, third party channel order number, etc.) presented to the merchant are derived from a multi-order form. That is, the first field includes the transaction amount from the order table, the payment status from the payment table, and the third party channel order number from the channel table.
Referring to table 1, the data information of a plurality of service data list tables in the embodiment of the present disclosure is shown.
TABLE 1
Figure BDA0003695778200000071
As can be seen from the above table, the first field (presentation field) of the data wide table can be summarized according to the information in the three data sheet tables in table 1. It should be understood that in practical applications, the data wide table can be designed according to business requirements. For example, there are now A, B, C tables, each containing 10 fields. Then, the data wide table may be designed to have 30 first fields. However, when the terminal device needs to use the data width table to solve a specific service requirement, the service requirement needs to use 3 first fields of the a table, 4 first fields of the B table, and 3 first fields of the C table, and then, in order to save space, the data width table only needs to be designed to have 10 first fields.
Further, other data information of the transaction table, the payment table, and the channel table can be obtained from table 1, such as the database name of each table, the table name, the database instance of the database in which the table is located, the unique index of the table, and the serial number. The transaction table, the payment table and the channel table are respectively from different databases and have different database instances (instance A, instance B and instance C). The database instance is a program, and is a channel for accessing the database. The terminal device performs any operation on the data in the database, including data definition, data query, data maintenance, database operation control, and the like, under the database instance. Therefore, in the embodiment of the present disclosure, when summarizing data in a plurality of service data list tables, different data of a database instance may be involved.
In some possible embodiments, S201 may further include: analyzing a second field in each service data form in the plurality of service data form tables; and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain the data wide table. The terminal equipment can gather the associated data through the second field, so that the associated data are stored together to obtain the data wide table. Furthermore, the terminal equipment only collects the data related to the first field, and the storage space is saved.
It should be understood that the terminal device executing S201 may parse out the second field in the multiple service ticket tables. Subsequently, using the second field, a service data list table associated with the second field is looked up. And finally, the terminal equipment acquires the data information corresponding to the first field from all the searched service data list tables, and summarizes all the data information.
And the second field is used for indicating the incidence relation among the data in the plurality of business data sheet tables. Specifically, the second field may be a field common to multiple service data list tables, and the second field may be analyzed to reflect association among multiple services, and further reflect association of data in multiple service data list tables.
For example, as shown in table 1, it is assumed that three business data sheet tables are generated in an existing business process according to business logic. The order falling sequence of the three business data sheet tables is a transaction table, a payment table and a channel table respectively, and the incidence relation of the three business data sheet tables is the transaction table, namely the channel table is 1:1:1, namely one transaction table corresponds to one payment table and one channel table corresponds to one channel table. And because the service data list table needs to transmit the unique single number generated by the source head table (i.e. the transaction table) to the downstream service and store the unique single number in the downstream service table during design, it can be known that the three service data list tables all contain the [ serial single number ] (which is equivalent to the second field). Then, when the terminal device uses the data wide table, all the service data sheet tables and all the data information of the same service can be obtained by obtaining the [ serial sheet number ] in each data sheet table. Necessarily, all data information of the same service is associated data. In addition, if the association relationship between the service data list table and the service data list table is 1: n, the service data list table with the proportion of n can be designed into a Nested type in an elastic search, and a Page Script is written to perform bulk up operation on a Nested field, so that all data in the service data list table can be associated and queried finally.
In some possible embodiments, S201 may further include: processing data in the plurality of business data single tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data single tables; acquiring associated data in the intermediate data by using the second field; and summarizing the associated data to obtain a data wide table.
It should be understood that, in order to save the storage space, the terminal device may process the data in the plurality of service data list tables according to the data range corresponding to the first field. The terminal device may obtain intermediate data from multiple service data list tables (i.e., data directly obtained from the service data list tables) by using a multithreading technique, then obtain associated data in the intermediate data (i.e., data from the same service) by using the second field, and finally summarize the associated data to obtain the data wide table. Because the first field is a presentation field of the data wide table, summarizing the associated data under the range can avoid storage of redundant data. Meanwhile, the terminal equipment adopts a multithreading technology, so that the query speed can be greatly improved.
The multithreading technique refers to a Central Processing Unit (CPU) executing a plurality of programs simultaneously. Specifically, the terminal device needs to process several billions or even several trillions of data every day, and simultaneously executes a plurality of programs to summarize the data into a data wide table. In addition, considering that database instances of different service domains are different and the design of sub-database and sub-table is considered, middleware adapted to different databases needs to be selected to complete the whole processing flow. For example, the data parsing is firstly completed by using the parsing middleware, then the transmission of the data is completed by using the message middleware, and finally the synchronous storage of the data is carried out.
For example, in order to ensure low latency of data synchronization, the terminal device may adopt a mode that the middleware canal listens to binlog, and parse the binlog into json format data by configuring a target data list table and a data routing rule of canal instance listening, and write the json format data into a configured kafka topic partition. The data routing rule may be that when all the document _ order _ no are specified by the document.mq.partitionhash parameters configured in the multiple data sheet tables, hash is performed on the data of the multiple data sheet tables according to the document _ order _ no when routing is performed. That is, data that is the same as track _ order _ no is routed to the same partition of topic and processed by the same consuming thread.
Further, Canal is a binlog parsing middleware that can parse binary files of binlog to generate data in json format. The component can configure databases and data tables concerned by the service by writing regular expressions. The component also supports sending the analyzed binlog data to a corresponding message queue according to a routing rule set in the configuration file. Therefore, the processing of the data in the database branch table can be realized through the Canal analysis middleware. In an example, Maxwell and FlinkCDC can be selected as the parsing middleware according to business requirements.
Further, kafka has the characteristics of high availability, high throughput, low delay, high concurrency and the like as message middleware. The topic partition of kafka can be set reasonably according to the size of the data amount in the current service unit time and the tolerance of the service to the data loss. The number of Kafka consumers is generally consistent with the number of topoic partitions of Kafka, maintaining high concurrent, high performance consumption. In an example, the rocktmq and the rabbitmq can be selected as message middleware according to business requirements.
In some possible embodiments, the terminal device executes S201, and the process of summarizing the associated data to obtain the data wide table may be a process of storing the data in an Elasticsearch after the data is processed in a unified manner. In this process, data needs to be mapped into the Elasticsearch in different categories, so a unified data processing framework needs to be designed.
Illustratively, the terminal device performs S201 a logical process that can be unified with a framework.
1. Data deleted by isDdl true and dmlType logic is filtered. The data of isDdl ═ true is some DDL operations on the service table, such as modifying field names, types and the like, and the data do not belong to the service data class and are not suitable for constructing a data wide table; deltype represents a database DELETE operation, and these data may also be discarded from storage.
2. Data obtained in kafka, from different databases and different tables. The table name design specification for such a library sublist is a library/table name + underlined + numeric composition, e.g., tb _ track _ order _ 0. Theoretically, the data structures of the tables of the same library are consistent, so that the numerical suffixes of the tables do not need to be concerned when data mapping is carried out, and the tables are treated as the data of the same table. Namely, the suffix (_0) is matched and erased through the regular expression, so that the table data with the same name can be processed by the same strategy logic, and the problem of data dispersion caused by database division and table division is solved.
3. A field filtering operation. For example, the service data list table often has some fields with no service meaning, or fields that are not needed for external display, and then such fields can be directly filtered out in S201, thereby saving storage space.
4. The yearly month (yyyMM) portion of the transaction order number is parsed to determine the index location written by the Elasticissearch.
5. The JSON field is underlined to hump, and the field is prefixed.
6. And writing the result set returned by the unified data processing interface into the Elasticissearch in batches. The number of IO interaction with the storage end is reduced through batch writing, and the writing performance can be improved.
The foregoing is a possible framework in the embodiments of the present disclosure, and in specific implementation, the framework may be designed according to actual requirements of a service, and this is not specifically limited in the embodiments of the present disclosure. In addition, when a set of framework is designed, common processing logic can be combed out, and a special service logic processing flow is abstracted into an interface. Then, for different services, the interfaces may be switched according to the actual service scenario.
In some possible embodiments, S201 may further include: after the associated data in the service data list tables are summarized by using the second field to obtain the data width table, determining the index of the associated data in the data width table according to the data volume of the associated data.
It should be understood that the terminal device uses the second field to summarize the associated data in the multiple service data list tables, and stores the summarized data in the Elasticsearch to obtain the data wide table. In addition, the terminal equipment can also plan the Elasticissearch index according to the size of the stored data. The terminal equipment firstly queries the corresponding index when the data wide table is used by constructing the index, and then links to the query data according to the index, so that the original query mode of default full-table scanning can be changed into the mode of firstly locating the storage position of a specific value in the index list once, the workload of query is greatly reduced, and the query speed can be obviously increased.
Wherein, the index is a document set which is related to each other, and the Elasticisearch stores data in the form of JSON document. Each document establishes a link between a set of keys (names of fields or attributes) and their corresponding values (strings, numbers, boolean values, dates, sets of values, geographic locations, or other types of data). The Elasticissearch uses an inverted index data structure, and the design of the structure can allow full-text search to be performed very quickly. The inverted index will list each unique word that appears in all documents and all documents that contain each word can be found. In the indexing process, the elastic search stores the documents and constructs the inverted index, so that the document data can be searched in near real time.
For example, the terminal device may plan the splitting granularity and the number of fragments of the data wide table index according to the size of the data volume of the traffic. Correspondingly, the terminal equipment can also divide the index into three areas of hot, arm and cold according to the access characteristics of the user. The hot area stores index documents which are frequently accessed and need to be modified after creation; the rom area stores read-only index documents with infrequent access; the cold area stores read-only index documents that occasionally need to be accessed.
Furthermore, the terminal device can also manage operations such as index creation, index alias setting, index migration at hot/arm/cold nodes and the like in a mode of scheduling Application Program Interface (API) by a timing task, so as to save machine cost and improve query performance.
Illustratively, the design rules for existing transaction order numbers include a year, month, and day (yyyyMMdd) section, for example, the transaction order number 20220505Txxxx 482953. Then, when building a split index at time granularity, the index can be divided by month. When the transaction order number is 20220505Txxxx482953, the data can be written into the index wt _ trade _ order-202205 (corresponding to one index in a month). Additionally, assuming 2022.05.01, through timed task scheduling, wt _ track _ order-202206 is generated at the hot node and wt _ track _ order-202105 is transferred from the hot node to the arm node.
In some possible embodiments, the method further includes S202. S202 may be executed before S101, S102, S103, and S201 described above, may be executed simultaneously with any one of S101, S102, S103, and S201 described above, and may be executed after S101, S102, S103, and S201 described above.
S202, detecting data change information in a plurality of service data list tables; and when detecting that the data in any single table is changed, updating the data associated with any single table in the data wide table.
It should be understood that the terminal device may obtain data change information in a plurality of service data list tables in real time, and store the changed data in the data wide table through the component, that is, when detecting that the data in any one of the list tables changes, the terminal device may update the data in the data wide table associated with any one of the list tables.
For example, in the above description, it is mentioned that the terminal device may use the middleware canal to monitor binlog, and parse the binlog into json format data by configuring the target data list table and the data routing rule monitored by canal instance, and write the json format data into the configured kafka topic partition. Wherein binlog refers to a binary log which records all changes on the database and is saved in a disk in a binary form. Then, the middleware canal can obtain the data change information of the data list table in the database by monitoring the binlog, further write the changed data information into the kafka topic partition, and finally realize updating the data in the data wide table through the data synchronization processing framework.
In the above embodiment, the terminal device executes steps 201 to S202, may generate the data width table through a plurality of data sheet tables, and update the data information in the data width table in real time, so as to realize real-time query of the data information through the data width table, and improve query efficiency of the data information.
Based on the same inventive concept, the embodiments of the present disclosure further provide a device for performing data query based on the data wide table, where the device for performing data query may be a chip or a system on a chip in the terminal device, and may also be a functional module in the terminal device for implementing the methods described in the above embodiments. The data query apparatus may implement the functions executed by the terminal device in the foregoing embodiments, and these functions may be implemented by executing corresponding software through hardware. These hardware or software include one or more functionally corresponding modules.
Fig. 3 is a schematic structural diagram of an apparatus for querying data in an embodiment of the present disclosure, and referring to fig. 3, the apparatus 300 for querying data includes: an obtaining module 301, configured to receive a query instruction of a user; a processing module 302, configured to query a data wide table according to a field in the query instruction to obtain query data, where the data wide table is obtained by summarizing data in a plurality of service data list tables; and an output module 303, configured to output the query data.
In some possible embodiments, the processing module 302 is further configured to: according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table; links to query data according to the index.
In some possible embodiments, the obtaining module 301 is further configured to: before receiving a query instruction of a user, acquiring a plurality of service data list tables; and summarizing the data in the plurality of business data list tables to obtain a data wide table.
In some possible embodiments, the obtaining module 301 is further configured to: determining a first field according to the service information in the plurality of service data list tables, wherein the first field is a field used for inquiring in the data wide table; and summarizing the data in the plurality of business data list tables according to the first field to obtain a data wide table.
In some possible embodiments, the obtaining module 301 is further configured to: analyzing a second field in each business data form in the multiple business data form tables, wherein the second field is used for indicating the incidence relation among data in the multiple business data form tables; and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain a data wide table.
In some possible embodiments, the obtaining module 301 is further configured to: processing data in the plurality of business data list tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data list tables, wherein the intermediate data is data corresponding to the field of the data wide table; acquiring associated data in the intermediate data by using the second field; and summarizing the associated data to obtain a data wide table.
In some possible embodiments, the obtaining module 301 is further configured to: after the associated data in the service data list tables are summarized by using the second field to obtain the data width table, determining the index of the associated data in the data width table according to the data volume of the associated data.
In some possible embodiments, the obtaining module 301 is further configured to: detecting data change information in a plurality of service data list tables; and when detecting that the data in any single table is changed, updating the data information associated with any single table in the data wide table.
It should be noted that, for the specific implementation process of the obtaining module 301, the processing module 302, and the output module 303, reference may be made to the detailed description of the embodiment in fig. 1 and fig. 2, and for brevity of the description, no further description is given here.
Based on the same inventive concept, the embodiments of the present disclosure provide a terminal, which may be the terminal described in one or more embodiments above. Fig. 4 is a schematic structural diagram of a terminal in an embodiment of the present disclosure, and referring to fig. 4, a terminal 400 may employ general computer hardware, and includes a processor 401 and a memory 402.
In some possible implementations, the at least one processor may constitute any physical device having circuitry to perform logical operations on one or more inputs. For example, at least one processor may include one or more Integrated Circuits (ICs) including an Application Specific Integrated Circuit (ASIC), a microchip, a microcontroller, a microprocessor, all or a portion of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or other circuitry suitable for executing instructions or performing logical operations. The instructions executed by the at least one processor may be preloaded into a memory integrated with or embedded in the controller, for example, or may be stored in a separate memory. The memory may include Random Access Memory (RAM), Read Only Memory (ROM), hard disk, optical disk, magnetic media, flash memory, other permanent, fixed, or volatile memory, or any other mechanism capable of storing instructions. In some embodiments, the at least one processor may comprise more than one processor. Each processor may have a similar structure, or the processors may have different configurations that are electrically connected or disconnected from each other. For example, the processor may be a separate circuit or integrated in a single circuit. When more than one processor is used, the processors may be configured to operate independently or cooperatively. The processors may be coupled electrically, magnetically, optically, acoustically, mechanically, or by other means that allow them to interact.
According to an embodiment of the present invention, the present invention further provides a computer readable storage medium, on which computer instructions are stored, the instructions being executed by a processor to perform the steps of the data query method based on the data width table. The memory 702 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory and/or random access memory. Memory 402 may store an operating system, application programs, other program modules, executable code, program data, user data, and the like.
Further, the memory 402 stores computer-executable instructions for implementing the functions of the respective modules in fig. 3. The functions/implementation processes of the modules in fig. 3 can be implemented by the processor 401 in fig. 4 calling the computer executing instructions stored in the memory 402, and the specific implementation processes and functions are referred to the above related embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (18)

1. A method for querying data based on a data wide table is characterized by comprising the following steps:
receiving a query instruction of a user;
inquiring in a data wide table according to the fields in the inquiry instruction to obtain inquiry data, wherein the data wide table is obtained by summarizing data in a plurality of business data single tables;
and outputting the query data.
2. The method of claim 1, wherein the querying a data wide table according to a field in the query instruction to obtain query data comprises:
according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table;
and linking to the query data according to the index.
3. The method of claim 1, wherein prior to receiving the user's query instruction, the method further comprises:
acquiring a plurality of service data list tables;
and acquiring the data wide table by summarizing the data in the plurality of business data single tables.
4. The method of claim 3, wherein the obtaining the data wide table by aggregating data in the plurality of business data sheet tables comprises:
determining a first field according to the service information in the plurality of service data list tables, wherein the first field is a field used for displaying in the data wide table;
and summarizing the data in the plurality of business data list tables according to the first field to obtain the data wide table.
5. The method of claim 4, wherein aggregating data in the plurality of business datasheet tables to obtain the data-wide table according to the first field comprises:
analyzing a second field in each business data form in the plurality of business data form tables, wherein the second field is used for indicating the incidence relation among data in the plurality of business data form tables;
and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain the data wide table.
6. The method according to claim 5, wherein the aggregating the associated data in the plurality of service data list tables using the second field according to the data range corresponding to the first field to obtain the data wide table comprises:
processing the data in the plurality of business data list tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data list tables;
acquiring associated data in the intermediate data by using the second field;
and summarizing the associated data to obtain the data wide table.
7. The method of claim 5, wherein after the using the second field, aggregating associated data in the plurality of business datasheet tables to obtain the data wide table, the method further comprises:
and determining the index of the associated data in the data width table according to the data size of the associated data.
8. The method of claim 1, further comprising:
detecting data change information in the plurality of business data sheet tables;
when the change of the data in any single table is detected, the data associated with the single table in the data wide table is updated.
9. An apparatus for querying data based on a data wide table, comprising:
the acquisition module is used for receiving a query instruction of a user;
the processing module is used for inquiring in a data wide table according to the fields in the inquiry instruction to obtain inquiry data, and the data wide table is obtained by summarizing data in a plurality of business data single tables;
and the output module is used for outputting the query data.
10. The apparatus of claim 9, wherein the processing module is further configured to: according to the fields in the query instruction, querying indexes corresponding to the fields in the data wide table; linking to the query data according to the index.
11. The apparatus of claim 9, wherein the obtaining module is further configured to: before the query instruction of the user is received, acquiring a plurality of service data list tables; and summarizing the data in the plurality of business data list tables to obtain the data wide table.
12. The apparatus of claim 11, wherein the obtaining module is further configured to: determining a first field according to service information in the plurality of service data list tables, wherein the first field is a field used for query in the data wide table; and summarizing the data in the plurality of business data list tables according to the first field to obtain the data wide table.
13. The apparatus of claim 12, wherein the obtaining module is further configured to: analyzing a second field in each business data form in the plurality of business data form tables, wherein the second field is used for indicating the incidence relation among data in the plurality of business data form tables; and summarizing the associated data in the plurality of business data list tables by using the second field according to the data range corresponding to the first field to obtain the data wide table.
14. The apparatus of claim 13, wherein the obtaining module is further configured to: processing the data in the plurality of business data list tables by a multithreading technology according to the data range corresponding to the first field to obtain intermediate data from the plurality of business data list tables, wherein the intermediate data is data corresponding to the field of the data wide table; acquiring associated data in the intermediate data by using the second field; and summarizing the associated data to obtain the data wide table.
15. The apparatus of claim 13, wherein the obtaining module is further configured to: after the associated data in the service data list tables are summarized by using the second field to obtain the data wide table, determining the index of the associated data in the data wide table according to the data volume of the associated data.
16. The apparatus of claim 9, wherein the obtaining module is further configured to: detecting data change information in the plurality of business data sheet tables; and when detecting that the data in any single table changes, updating the data information associated with any single table in the data wide table.
17. A terminal, comprising:
a memory for storing processor-executable instructions;
a processor; wherein the processor is configured to: for implementing the method of any one of claims 1 to 8 when executing said executable instructions.
18. A computer-readable storage medium, characterized in that the readable storage medium stores an executable program, wherein the executable program, when executed by a processor, implements the method of any one of claims 1 to 8.
CN202210679262.XA 2022-06-15 2022-06-15 Method, device and equipment for querying data based on data wide table Pending CN115114319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210679262.XA CN115114319A (en) 2022-06-15 2022-06-15 Method, device and equipment for querying data based on data wide table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210679262.XA CN115114319A (en) 2022-06-15 2022-06-15 Method, device and equipment for querying data based on data wide table

Publications (1)

Publication Number Publication Date
CN115114319A true CN115114319A (en) 2022-09-27

Family

ID=83328126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210679262.XA Pending CN115114319A (en) 2022-06-15 2022-06-15 Method, device and equipment for querying data based on data wide table

Country Status (1)

Country Link
CN (1) CN115114319A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610714A (en) * 2023-07-14 2023-08-18 北京数巅科技有限公司 Data query method, device, computer equipment and storage medium
CN116821245A (en) * 2023-07-05 2023-09-29 贝壳找房(北京)科技有限公司 Data aggregation synchronization method and storage medium in distributed scene
CN116955417A (en) * 2023-09-19 2023-10-27 武汉大数据产业发展有限公司 Optimization method and device for multi-table combined retrieval of data and electronic equipment
CN117390040A (en) * 2023-12-11 2024-01-12 深圳大道云科技有限公司 Service request processing method, device and storage medium based on real-time wide table

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821245A (en) * 2023-07-05 2023-09-29 贝壳找房(北京)科技有限公司 Data aggregation synchronization method and storage medium in distributed scene
CN116610714A (en) * 2023-07-14 2023-08-18 北京数巅科技有限公司 Data query method, device, computer equipment and storage medium
CN116610714B (en) * 2023-07-14 2023-10-31 北京数巅科技有限公司 Data query method, device, computer equipment and storage medium
CN116955417A (en) * 2023-09-19 2023-10-27 武汉大数据产业发展有限公司 Optimization method and device for multi-table combined retrieval of data and electronic equipment
CN117390040A (en) * 2023-12-11 2024-01-12 深圳大道云科技有限公司 Service request processing method, device and storage medium based on real-time wide table
CN117390040B (en) * 2023-12-11 2024-03-29 深圳大道云科技有限公司 Service request processing method, device and storage medium based on real-time wide table

Similar Documents

Publication Publication Date Title
CN115114319A (en) Method, device and equipment for querying data based on data wide table
CN110019397B (en) Method and device for data processing
CN111767303A (en) Data query method and device, server and readable storage medium
CN114880346B (en) Data processing method, related assembly and acceleration processor
CN111026727A (en) Table dimension retrieval data synchronization method, system and device based on log file
US20210240784A1 (en) Method, apparatus and storage medium for searching blockchain data
CN111506559A (en) Data storage method and device, electronic equipment and storage medium
CN109753596B (en) Information source management and configuration method and system for large-scale network data acquisition
WO2021047323A1 (en) Data operation method and apparatus, and system
CN114356921A (en) Data processing method, device, server and storage medium
CN114461603A (en) Multi-source heterogeneous data fusion method and device
CN102902763A (en) Method and device for relating and retrieving information processing data and processing information tasks
CN112416991A (en) Data processing method and device and storage medium
WO2023000785A1 (en) Data processing method, device and system, and server and medium
CN112800058A (en) Method for realizing HBase secondary index
CN115408391A (en) Database table changing method, device, equipment and storage medium
US9390131B1 (en) Executing queries subject to different consistency requirements
CN111008198B (en) Service data acquisition method and device, storage medium and electronic equipment
US11157506B2 (en) Multiform persistence abstraction
CN115599973A (en) User crowd label screening method, system, equipment and storage medium
CN115048421A (en) Data processing method, device, equipment and storage medium
CN113918630A (en) Data synchronization method and device, computer equipment and storage medium
CN113377604B (en) Data processing method, device, equipment and storage medium
CN113821573A (en) Mass data rapid retrieval service construction method, system, terminal and storage medium
CN108664503A (en) A kind of data archiving method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination