CN117331967A - Data query method, device, equipment and medium - Google Patents

Data query method, device, equipment and medium Download PDF

Info

Publication number
CN117331967A
CN117331967A CN202311297492.0A CN202311297492A CN117331967A CN 117331967 A CN117331967 A CN 117331967A CN 202311297492 A CN202311297492 A CN 202311297492A CN 117331967 A CN117331967 A CN 117331967A
Authority
CN
China
Prior art keywords
data
file
database
report
generation time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311297492.0A
Other languages
Chinese (zh)
Inventor
赵建新
张靖波
陈思江
李毅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311297492.0A priority Critical patent/CN117331967A/en
Publication of CN117331967A publication Critical patent/CN117331967A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data query, in particular to a data query method, device, equipment and medium, which are used for solving the problem of low query speed of a data report in the prior art. The method comprises the following steps: responding to the operation of report inquiry by a user, and determining inquiry conditions of the report inquiry, wherein the inquiry conditions comprise generation time and at least one element required by report generation; determining at least one target file matching the production time from a file directory of a database; the file catalogues corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; and respectively determining at least one target data matched with the at least one element from the at least one target file, and generating a report based on the at least one target data.

Description

Data query method, device, equipment and medium
Technical Field
The present invention relates to the field of data query technologies, and in particular, to a data query method, device, equipment, and medium.
Background
With the development and innovation of information technology, business data is increasing at an exponential growth rate. Traditional report data processing is implemented by querying a relational database through linked lists, i.e., associating data in multiple tables to retrieve information. However, if the amount of data is particularly large, the table lookup takes a long time, which has a great influence on the performance of the system. With the continuous increase of business data, the batch processing capability of the database gradually influences the overall performance of the system, and the traditional database cannot process massive data efficiently and rapidly.
Disclosure of Invention
The embodiment of the application provides a data query method, device, equipment and medium, which are used for solving the problem of low query speed of a data report in the prior art.
In a first aspect, an embodiment of the present application provides a method for querying data, where the method includes:
responding to the operation of report inquiry by a user, and determining inquiry conditions of the report inquiry, wherein the inquiry conditions comprise generation time and at least one element required by report generation;
determining at least one target file matching the production time from a file directory of a database; the file names corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; each file comprises a column of data in a partition table of the HIVE database, and the generation time of the column of data is the same;
and respectively determining at least one target data matched with the at least one element from the at least one target file, and generating a report based on the at least one target data.
Based on the above scheme, a column of data with the same generation time in the partition table of the HIVE database is stored as one file, and the file directory of each file in the database comprises the generation time corresponding to the file. When inquiring the data report, according to the generation time in the inquiring condition, at least one target file with the file name matched with the generation time in the file catalog is obtained from the database, and the data in the at least one target file is screened based on the inquiring element, so that the target data is obtained and the report is generated. The method does not need to search the whole data report, only needs to determine the corresponding target file according to the generation time, reduces the search range of inquiry and improves the inquiry speed of the report.
In one possible implementation, the data included in each file in the database is obtained by processing the original data by the HIVE database.
In one possible implementation, the method further includes: the raw data is processed by: receiving an original data file; the date of data generation in the original data file is the same; preprocessing the original data file to obtain preprocessed data; wherein the pretreatment comprises filtration and washing; and determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
Based on the scheme, in the process of processing data by the HIVE database, the verification of the original data file and the data cleaning are added, so that the accuracy of service data is greatly improved.
In a possible implementation manner, the data processing of the preprocessed data according to the at least one data processing rule includes:
determining at least one data required for data processing according to each data processing rule;
and processing at least one piece of data according to each data processing rule.
Based on the scheme, processing is carried out on the data through the HIVE database, and the processed data is stored in the database. When the report is queried, the corresponding data is only needed to be read from the database, and calculation of the data is not needed when the report data is acquired, so that the data acquisition speed of the report is improved.
In one possible implementation, the method further includes: storing the processed data in a partition table created by an HIVE database; the generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times;
and respectively sending the data included in each column of the partition table to the database, and respectively storing the data included in each column of the partition table as a file, wherein the file directory of the file comprises the generation time corresponding to the column value of the partition table.
By the scheme, the data model in the HIVE database is set as the partition table, so that a large data set can be divided into small data sets according to service requirements, and the data processing capacity of the database is improved.
In a second aspect, an embodiment of the present application provides a data query device, including:
the first determining unit is used for responding to the operation of report query by a user and determining the query condition of the report query, wherein the query condition comprises generation time and at least one element required by report generation;
a second determining unit configured to determine at least one target file matching the generation time from a file directory of a database; the file names corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; each file comprises a column of data in a partition table of the HIVE database, and the generation time of the column of data is the same;
and the method is used for respectively determining at least one target data matched with the at least one element from the at least one target file and generating a report based on the at least one target data.
In one possible implementation, the data included in each file in the database is obtained by processing the original data by the HIVE database.
In a possible implementation manner, the second determining unit is further configured to process the raw data by: receiving an original data file; the date of data generation in the original data file is the same;
preprocessing the original data file to obtain preprocessed data; wherein the preprocessing comprises checksum cleaning;
and determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
In a possible implementation manner, the second determining unit is specifically configured to, when performing data processing on the preprocessed data according to the at least one data processing rule:
determining at least one data required for data processing according to each data processing rule;
and processing at least one piece of data according to each data processing rule.
In a possible implementation manner, the second determining unit is further configured to:
storing the processed data in a partition table created by an HIVE database; the generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times;
and sending the data included in each column of the partition table to the database, and respectively storing the data included in each column of the partition table as a file, wherein a file directory of the file comprises the generation time corresponding to the column value of the partition table.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing program instructions;
a processor for invoking program instructions stored in the memory and executing the steps comprised by the method according to any of the first aspects in accordance with the obtained program instructions.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any one of the first aspects.
In a fifth aspect, the present application provides a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the first aspects.
In addition, the technical effects caused by any implementation manner of the second aspect to the fifth aspect may be referred to as the technical effects caused by the first aspect and the different implementation manners of the first aspect, which are not described herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for querying data according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a file directory according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a data report according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of processing data by the HIVE database according to an embodiment of the present application;
FIG. 6 is a flowchart of data processing integration according to an embodiment of the present disclosure;
FIG. 7 is a process flow diagram of a data processing module according to an embodiment of the present application;
FIG. 8 is a block diagram of a data query device according to an embodiment of the present application;
fig. 9 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. Embodiments and features of embodiments in this application may be combined with each other arbitrarily without conflict. Also, while a logical order of illustration is depicted in the flowchart, in some cases the steps shown or described may be performed in a different order than presented.
The terms first and second in the description and claims of the present application and in the above-described figures are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The term "plurality" in the present application may mean at least two, for example, two, three or more, and embodiments of the present application are not limited.
In the technical scheme, the data are collected, transmitted, used and the like, and all meet the requirements of national related laws and regulations.
In order to facilitate understanding of the solutions proposed in the present application, the terms referred to in the present application are explained below.
HIVE: the Hadoop-based data warehouse tool is used for extracting, converting and loading data. This is a mechanism by which large-scale data stored in Hadoop can be stored, queried, and analyzed. HIVE is well suited for statistical analysis of data warehouses.
Partitioning: partitioning is one way in which HIVEs store data. The column value is used as a directory to store data, namely a partition. When inquiring, the partition columns are used for filtering, only the data under the corresponding directory need to be directly scanned according to the column values, other partitions which are not concerned are not scanned, the quick positioning is realized, and the inquiring efficiency is improved.
ODS layer: the original data layer is the layer closest to the data in the data source.
DW layer: and a data detail layer for cleaning, dimension degradation, desensitization and the like of the ODS layer data. Covering all systematic, complete, clean, consistent data layers.
DM layer: and the data mart layer provides data for various statistical reports.
Before describing the data query method provided in the embodiments of the present application, for convenience of understanding, the following technical background of the embodiments of the present application will be described in detail.
In a relational database, linked list queries are used to retrieve information by associating data in multiple tables. It is by using JOIN clauses in the query to JOIN two or more tables. The connection condition is based on an association key between two or more tables. Where a key chain refers to a column name between two or more tables for matching data. In some scenarios, it is also necessary to set a "condition" which is partly optional for further screening of the results. The query will return all matching rows of both tables that meet the connection condition.
However, in this manner, when the amount of data is particularly large, the table-linking query takes a long time, which has a great influence on the performance of the system. And, the result data of the report is processed in the traditional database. With the continuous development of information technology, the growth of business data is accelerated. Traditional databases have not been able to efficiently and quickly process mass data.
In view of the above problems, embodiments of the present application provide a method, an apparatus, a device, and a medium for querying data, where a file directory of each file in a database includes a generation time corresponding to the file. When the file catalogue of the file in the database queries the data report, at least one target file with the file name matched with the generation time is obtained from the database according to the generation time in the query condition, and the data in the at least one target file is screened based on the query element, so that the target data is obtained and the report is generated. The method does not need to search the whole data report, only needs to determine the corresponding target file according to the generation time, reduces the search range of inquiry and improves the inquiry speed of the report.
The method for querying data provided in the embodiments of the present application may be executed by an electronic device, and in some embodiments, the electronic device may be a terminal device. The terminal device may be a display device having a display function. The display device may include: smart televisions, cell phones, tablet computers, and the like. In other scenarios, the electronic device may be implemented by one or more servers, which may be local servers or cloud servers. The structure and application scenario of an electronic device are described below by taking the electronic device as a server as an example.
Referring to fig. 1, the server 100 may be implemented as a physical server or may be implemented as a virtual server. The server can be realized by a single server, can be realized by a server cluster formed by a plurality of servers, and can realize the data query method provided by the application by the single server or the server cluster.
In fig. 1, the server 100 is connected to the display device 200. The server 100 may perform a query method of data. In some cases, the server 100 may receive a query request of data transmitted from the display device 200 and perform a query method of data, or display a query result of data through the display device 200.
It should be noted that the structure shown in fig. 1 is merely an example, and the embodiment of the present application is not limited thereto.
The embodiment of the application provides a data query method, and fig. 2 exemplarily shows a flow of the data query method. The process may be performed by an electronic device, which may be a display device, or may be the server 100, and the specific process is as follows:
201, responding to the operation of report query by a user, and determining the query condition of the report query.
In some scenarios, the query conditions include a time of generation and at least one element required for report generation. As an example, if the user needs to inquire about the deposit amount of the customer a in about three days, the generation time is about three days, and the inquiry element includes the customer a and the deposit amount.
In one possible implementation, each day is taken as one time information, and then three time information are generated corresponding to the time. For example, the current date is 2023, 5, 10, then the production time includes 2023, 5, 7, 2023, 5, 8, and 2023, 5, 9.
The data corresponding to the query element may be raw data, or may be data that can be obtained after processing the raw data.
202, at least one target file matching the production time is determined from a file directory of a database.
In some embodiments, file names corresponding to the plurality of files in the database respectively include generation times corresponding to the plurality of files respectively. In some embodiments, the database may be an oracle database. For example, a file directory of a database (HDFS file system) is shown in fig. 3. Wherein 20230101, 20230102 and 20230103 are the corresponding generation times of the files. In some scenarios, each file includes a column of data in the partition table of the HIVE database, the generation time of the column of data is the same, and the file directory includes the generation time corresponding to the file.
After determining the query conditions in the query conditions, at least one target file that matches the generation time may be determined from the generation time in the query conditions. As an example, the generation time in the query condition is near two days, and the current date is 2023, 01, 04, and then the time of near two days is 2023, 01, 03, and 2023, 01, 02, respectively. Further, two target files matched with the generation time can be determined, wherein file directories corresponding to the two target files are respectively a/home/ap/nastmp/20230103/20200103 data file and a/home/ap/nastmp/20230102/20230102 data file.
203, determining at least one target data matched with at least one element from at least one target file respectively, and generating a report based on the at least one target data.
As one example, a report query is used to query client A for the amount of deposit in the last two days, and then, as an example, after two target files are determined, client A's amount of deposit in the last two days can be found in the two target files.
For example, when customer a has stored 1 ten thousand yuan and 3 ten thousand yuan on 2023, 01, 03, and 2023, 01, 02, respectively, a report can be generated based on the deposit data, as shown in fig. 4.
In some embodiments, the data included in each file in the database is obtained from the HIVE database after processing the original data. Specifically, the raw data may be processed as follows, see fig. 5:
501, receiving an original data file; the date of data generation in the original data file is the same.
And 502, preprocessing the original data file to obtain preprocessed data.
Wherein the pretreatment comprises filtration and washing.
503, determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
Specifically, the data processing of the preprocessed data according to at least one data processing rule may be implemented in the following manner: for each data processing rule, at least one data required for data processing is determined. Further, the processing of at least one data required may be performed according to each data processing rule.
As an example, when deposit data is received, the deposit increment, the homonymous increment rate, and the like may be calculated from the deposit data. The deposit increment rate and the equal ratio increment rate respectively correspond to different data processing rules. Further, the data required by the data processing rule can be processed according to the corresponding data, so that the processed data can be obtained. Following the above example, the processed data is deposit increment rate and homonymous increment rate.
In some embodiments, the processed data may be stored in a partition table created by the HIVE database. The generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times. In some embodiments, the HIVE database comprises a plurality of partition tables, each partition table for storing processed data. In the same column of the same partition table, the generation time of the original data corresponding to each processed data is the same. In some scenarios, the column value of the partition table may be the generation time. The generation time is the generation time of the original data corresponding to the processed data. As an example, when the column value of the partition table is 20230102, the data in the partition table is obtained by processing the raw data of 2023, 01 and 02. In some embodiments, the column value of the partition table may also be the time of generation of the processed data.
Further, the data included in each column of the partition table may be sent to the database separately, and the data included in each column of the partition table may be stored as a file separately. The file directory of the file includes a generation time corresponding to a column value of the partition table. The file name also includes the generation time. As shown in fig. 3, 20230101, 20230102 and 20230103 are column values corresponding to the partition table.
In the application, the HIVE database is used as the DW layer of the data warehouse, and after the original data transmitted by the ODS layer is received, the original data is loaded into the HIVE database after simple data cleaning operation is carried out on the original data. The HIVE database has the advantages of processing big data, and is very suitable for processing report data with higher delay. And after the HIVE processing is finished and the service data is integrated, reloading the result data to the DM layer. The HIVE database is adopted to process the business data with large data volume, so that batch tasks can be completed more efficiently.
In the embodiment of the application, the processing integration of the report data is completed by loading, converting and processing the data through the HIVE database. Referring to fig. 6, the overall flow of data processing is as follows:
the original data layer transmits the application original data to the data detail layer by means of a dat file 601. The dat file includes attribute information and an acquisition position of a file corresponding to the original data.
602, performing data strip number and file verification on the data file. After passing the verification, the data file is simply cleaned and modified, and then is loaded into the HIVE database.
Specifically, the DW layer may load an original data file according to the acquisition location and verify attribute information of the file. By way of example, the attribute information may include the number of data pieces and the file name.
603, executing the data processing rule by the hive database, completing processing integration of report data, and storing the result data into a result table of a corresponding report.
And 604, exporting the report result data into a dat file, and loading the dat file into a data mart layer.
In the embodiment of the application, the data model in the HIVE database is set as a partition table. The partition table is actually a separate folder on one HDFS file system that is followed by all the data files of the partition. The partitions in HIVE are directories, and divide a large data set into smaller data sets according to service requirements, so that the capability of processing data of the database is improved.
In the application, a data processing module in the HIVE database can check file processing rules and process original service data according to the processing rules. The specific flow is shown in fig. 7:
1: and creating an entity table in the HIVE database according to the table creation statement, wherein the entity table is used for receiving the original data.
2: check if the file arrives. If not, ending the flow, otherwise, judging whether the sizes of the data files are consistent.
3: when the data file sizes are consistent, the original data are cleaned and loaded into the corresponding HIVE data table. Otherwise, the process is ended.
4: check if machining logic is present and in compliance with the specification. If yes and standard, data processing is carried out, otherwise, the processing is ended. In some scenarios, the processing logic may be executed in the HIVE database and loaded into the partition table.
5: and judging whether the data is successfully processed. And if the data is successful, the data is exported to a DM layer database for displaying the business report. Otherwise, the process is ended.
In the application, in the large environment of business data surge, a high-efficiency and accurate data query method based on the HIVE database is constructed by replacing the DW layer with the HIVE database and combining a data model and a data processing module design. The bottleneck of poor performance of processing big data of the traditional database is solved, the system pressure is reduced, and the system performance and report processing efficiency are improved.
Referring to fig. 8, an embodiment of the present application provides a data query device 800 based on the same inventive concept. The apparatus 800 may perform any step of the above-mentioned data query method, and in order to avoid repetition, the description is omitted here. The apparatus 800 includes: a first determination unit 801 and a second determination unit 802.
A first determining unit 801, configured to determine a query condition of a report query in response to an operation of performing the report query by a user, where the query condition includes a generation time and at least one element required for generating the report;
a second determining unit 802 for determining at least one target file matching the generation time from the file directory of the database; the file names corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; each file comprises a column of data in a partition table of the HIVE database, and the generation time of the column of data is the same;
and the method is used for respectively determining at least one target data matched with the at least one element from the at least one target file and generating a report based on the at least one target data.
In one possible implementation, the data included in each file in the database is obtained by processing the original data by the HIVE database.
In a possible implementation manner, the second determining unit 802 is further configured to process the raw data by: receiving an original data file; the date of data generation in the original data file is the same;
preprocessing the original data file to obtain preprocessed data; wherein the preprocessing comprises checksum cleaning;
and determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
In a possible implementation manner, the second determining unit 802 is specifically configured to, when performing data processing on the preprocessed data according to the at least one data processing rule:
determining at least one data required for data processing according to each data processing rule;
and processing at least one piece of data according to each data processing rule.
In a possible implementation manner, the second determining unit 802 is further configured to: storing the processed data in a partition table created by an HIVE database; the generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times;
and respectively sending the data included in each column of the partition table to the database, and respectively storing the data included in each column of the partition table as a file, wherein the file directory of the file comprises the generation time corresponding to the column value of the partition table.
Based on the same inventive concept, the embodiment of the present application provides an electronic device, which can implement the functions of the data query device discussed above, and referring to fig. 9, the device includes a processor 901 and a memory 902.
A memory 901 for storing program instructions;
and a processor 902, configured to call the program instructions stored in the memory, and execute any step of the data query method according to the obtained program instructions.
In the present embodiment, the processor 902 is a control center of the electronic device, connects various parts of the electronic device using various interfaces and routes, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 901, and calling data stored in the memory 901. Optionally, the processor 902 may include one or more processing units. The processor 902 may be, for example, a control component of a processor, microprocessor, controller, etc., such as a general purpose central processing unit (central processing unit, CPU), general purpose processor, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuits, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof.
The memory 901 may be used to store software programs and modules, and the processor 902 performs various functional applications and data processing by running the software programs and modules stored in the memory 901. The memory 901 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to business processes, etc. The memory 901, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 901 may include at least one type of storage medium, and may include, for example, flash Memory, a hard disk, a multimedia card, card-type Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 901 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 901 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.
Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium, the computer program product comprising: computer program code which, when run on a computer, causes the computer to perform a method of querying any of the data as previously discussed. Since the principle of solving the problem by the computer readable storage medium is similar to that of the data query method, the implementation of the computer readable storage medium can refer to the implementation of the method, and the repetition is omitted.
Based on the same inventive concept, embodiments of the present application also provide a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform a method of querying any of the data as previously discussed. Since the principle of the solution of the problem of the computer program product is similar to that of the query method of the data, the implementation of the computer program product can refer to the implementation of the method, and the repetition is omitted.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of user operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (13)

1. A method for querying data, comprising:
responding to the operation of report inquiry by a user, and determining inquiry conditions of the report inquiry, wherein the inquiry conditions comprise generation time and at least one element required by report generation;
determining at least one target file matching the production time from a file directory of a database; the file names corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; each file comprises a column of data in a partition table of the HIVE database, and the generation time of the column of data is the same;
and respectively determining at least one target data matched with the at least one element from the at least one target file, and generating a report based on the at least one target data.
2. The method of claim 1, wherein the data included in each file in the database is obtained from the HIVE database after processing the original data.
3. The method of claim 2, wherein the method further comprises:
the raw data is processed by:
receiving an original data file; the date of data generation in the original data file is the same;
preprocessing the original data file to obtain preprocessed data; wherein the pretreatment comprises filtration and washing;
and determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
4. The method of claim 3, wherein the data processing the preprocessed data according to the at least one data processing rule comprises:
determining at least one data required for data processing according to each data processing rule;
and processing at least one piece of data according to each data processing rule.
5. The method of claim 3 or 4, wherein the method further comprises:
storing the processed data in a partition table created by an HIVE database; the generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times;
and respectively sending the data included in each column of the partition table to the database, and respectively storing the data included in each column of the partition table as a file, wherein a file directory of the file comprises the generation time corresponding to the column value of the partition table.
6. A data query device, comprising:
the first determining unit is used for responding to the operation of report query by a user and determining the query condition of the report query, wherein the query condition comprises generation time and at least one element required by report generation;
a second determining unit configured to determine at least one target file matching the generation time from a file directory of a database; the file names corresponding to the files in the database respectively comprise the generation time corresponding to the files respectively; each file comprises a column of data in a partition table of the HIVE database, and the generation time of the column of data is the same;
and the method is used for respectively determining at least one target data matched with the at least one element from the at least one target file and generating a report based on the at least one target data.
7. The apparatus of claim 6, wherein the data included in each file in the database is obtained from a HIVE database after processing the raw data.
8. The apparatus of claim 7, wherein the second determining unit is further for:
the raw data is processed by:
receiving an original data file; the date of data generation in the original data file is the same;
preprocessing the original data file to obtain preprocessed data; wherein the pretreatment comprises filtration and washing;
and determining at least one data processing rule associated with the data attribute, and processing the preprocessed data according to the at least one data processing rule to obtain processed data.
9. The apparatus of claim 8, wherein the second determining unit is configured to, when performing data processing on the preprocessed data according to the at least one data processing rule:
determining at least one data required for data processing according to each data processing rule;
and processing at least one piece of data according to each data processing rule.
10. The apparatus according to claim 8 or 9, wherein the second determining unit is further configured to:
storing the processed data in a partition table created by an HIVE database; the generation time of the original data corresponding to the processed data in the partition table is the same, and different partition tables correspond to different generation times;
and respectively sending the data included in each column of the partition table to the database, and respectively storing the data included in each column of the partition table as a file, wherein the file directory of the file comprises the generation time corresponding to the column value of the partition table.
11. An electronic device, comprising:
a memory for storing program instructions;
a processor for invoking program instructions stored in the memory and for performing the steps comprised in the method according to any of claims 1-5 in accordance with the obtained program instructions.
12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-5.
13. A computer program product, the computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the preceding claims 1-5.
CN202311297492.0A 2023-10-09 2023-10-09 Data query method, device, equipment and medium Pending CN117331967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311297492.0A CN117331967A (en) 2023-10-09 2023-10-09 Data query method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311297492.0A CN117331967A (en) 2023-10-09 2023-10-09 Data query method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117331967A true CN117331967A (en) 2024-01-02

Family

ID=89289875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311297492.0A Pending CN117331967A (en) 2023-10-09 2023-10-09 Data query method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117331967A (en)

Similar Documents

Publication Publication Date Title
CN106528787B (en) query method and device based on multidimensional analysis of mass data
US9135647B2 (en) Methods and systems for flexible and scalable databases
CN111198961B (en) Commodity searching method, commodity searching device and commodity searching server
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN114741368A (en) Log data statistical method based on artificial intelligence and related equipment
CN112527824B (en) Paging query method, paging query device, electronic equipment and computer-readable storage medium
CN115905630A (en) Graph database query method, device, equipment and storage medium
EP3779720B1 (en) Transaction processing method and system, and server
CN108874873B (en) Data query method, device, storage medium and processor
CN111159213A (en) Data query method, device, system and storage medium
CN112434056A (en) Method and device for inquiring detailed data
CN117331967A (en) Data query method, device, equipment and medium
CN112835932B (en) Batch processing method and device for business table and nonvolatile storage medium
CN112464049B (en) Method, device and equipment for downloading number detail list
CN114564621A (en) Method, device and equipment for associating data and readable storage medium
CN112925834B (en) Data importing method and device
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
CN112434057A (en) Data query method and device
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN115510204B (en) Intelligent water service data resource catalog management method and device
CN115686939B (en) Data backup method, device, computer equipment and storage medium
CN115544096B (en) Data query method and device, computer equipment and storage medium
US20240329925A1 (en) Data processing method, apparatus, electronic device, and storage medium
CN115994160A (en) Service data query method and device, electronic equipment and storage medium
CN116501735A (en) Multi-entity report generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination