CN115357628A - Data report generation method and device, computer equipment and storage medium - Google Patents

Data report generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115357628A
CN115357628A CN202211114029.3A CN202211114029A CN115357628A CN 115357628 A CN115357628 A CN 115357628A CN 202211114029 A CN202211114029 A CN 202211114029A CN 115357628 A CN115357628 A CN 115357628A
Authority
CN
China
Prior art keywords
data
queried
preset
report
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211114029.3A
Other languages
Chinese (zh)
Inventor
陈维涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202211114029.3A priority Critical patent/CN115357628A/en
Publication of CN115357628A publication Critical patent/CN115357628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention relates to the technical field of big data, and provides a data report generation method and device, computer equipment and a storage medium. The method comprises the steps of acquiring a data source of data to be inquired of request information according to request information generated by a received report, detecting whether a database corresponding to the data source is a preset source database, synchronizing the data to be inquired from the preset source database to a service node corresponding to a preset engine to be stored when the database corresponding to the data source is detected to be the preset source database, matching an inquiry mode in the preset engine as a target inquiry mode according to the data quantity of the data to be inquired of the request information, exporting the data to be inquired from the service node corresponding to the preset engine through the target inquiry mode, generating the report corresponding to the data to be inquired, synchronizing the data to be inquired to the preset engine, improving inquiry of mass data, and improving the generation efficiency of the report corresponding to mass data.

Description

Data report generation method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a method and an apparatus for generating a data report, a computer device, and a storage medium.
Background
In enterprises, with the increase of different business data volumes, business data are generally generated into corresponding reports, the corresponding business data are presented in the form of the reports, and a good business report can well reflect the performance and index trend of nearly several months and plays a key role in subsequent strategic development decisions. In the prior art, the business data for generating the report is generally stored in a relational database, and a corresponding business report is generated by searching and querying data in the relational database, but when the business data is stored by using the relational database, the complexity of data structure storage is increased along with the increasing of the quantity of the business data, when a large amount of data is queried, the query performance is low due to the complexity of the data, the generation efficiency of the report is reduced, and a large amount of resources are consumed in the long-time generation process of the report, so how to improve the generation efficiency of the report becomes an urgent problem to be solved.
Disclosure of Invention
Therefore, it is necessary to provide a method, an apparatus, a computer device and a storage medium for generating a data report to solve the problem of low report generation efficiency.
In a first aspect, a method for generating a data report is provided, where the method includes:
acquiring a data source of data to be queried of the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
when detecting that the database corresponding to the data source is the preset source database, synchronizing the data to be queried from the preset source database to a service node corresponding to the preset engine for storage;
and according to the data volume of the data to be queried in the request information, matching a query mode in the preset engine as a target query mode, and exporting the data to be queried from a service node corresponding to the preset engine through the target query mode to generate a report corresponding to the data to be queried.
In a second aspect, an apparatus for generating a datagram table is provided, the apparatus comprising:
the detection module is used for acquiring a data source of data to be inquired by the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
the synchronization module is used for synchronizing the data to be queried from the preset source database to a service node corresponding to the preset engine for storage when detecting that the database corresponding to the data source is the preset source database;
and the report generation module is used for matching the query mode in the preset engine as a target query mode according to the data volume of the data to be queried in the request information, deriving the data to be queried from the service node corresponding to the preset engine through the target query mode, and generating a report corresponding to the data to be queried.
In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the data table generating method according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the data table generating method according to the first aspect is implemented.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of obtaining a data source of data to be inquired of request information according to request information generated by a received report, detecting whether a database corresponding to the data source is a preset source database, synchronizing the data to be inquired to a service node corresponding to a preset engine from the preset source database to be stored when the database corresponding to the data source is detected to be the preset source database, matching an inquiry mode in the preset engine as a target inquiry mode according to the data quantity of the data to be inquired of the request information, leading out the data to be inquired from the service node corresponding to the preset engine through the target inquiry mode, generating a report corresponding to the data to be inquired, synchronizing the inquiry data to the preset engine, improving inquiry of mass data, and improving generation efficiency of the report corresponding to the mass data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
Fig. 1 is a schematic diagram of an application environment of a data report generating method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a data report generating method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a data report generating method according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a data report generating method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data report generating apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present specification and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
It should be understood that, the sequence numbers of the steps in the following embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
An embodiment of the present invention provides a method for generating a datagram table, which can be applied to an application environment as shown in fig. 1, where a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other computer devices. The server side can be implemented by an independent server or a server cluster formed by a plurality of servers.
Fig. 2 is a schematic flow chart of a data report generating method according to an embodiment of the present invention, where the data report generating method can be applied to the server in fig. 1, and the server is connected to a corresponding client. As shown in fig. 2, the data report generation method may include the following steps.
S201: according to the request information generated by the received report, the data source of the data to be queried of the request information is obtained, and whether the database corresponding to the data source is a preset source database or not is detected.
In step S201, a data source of data to be queried is obtained according to request information generated by the received report, where the request information includes the data source of the data to be queried, and it is detected whether a database corresponding to the data source is a preset source database.
In this embodiment, the received request information for generating the report includes a data source of query data, a database name, and the like, where the request information for generating the report may be instruction information triggered by a user and used for generating the report, and may also be instruction information automatically generated by a client according to preset time or when a preset condition is met. And acquiring the storage address of the query data according to the data source, thereby detecting whether the database corresponding to the query data is a preset source database. When the data source is detected, whether the database corresponding to the query data is a preset source database or not can be detected by detecting an API (application programming interface), an SDK (software development kit) interface, a shell script, a python script and the like corresponding to the data source.
The preset source database is a data warehouse tool based on Hadoop, can map a structured data file into a database table, provides a simple SQL query function, can convert SQL sentences into MapReduce tasks to run, quickly realizes simple MapReduce statistics through SQL-like sentences, does not need to develop special MapReduce application, and is very suitable for statistical analysis of a data warehouse. A source database is preset in a server, namely the server draws an independent storage space for storing data.
S202: and when detecting that the database corresponding to the data source is the preset source database, synchronizing the data to be queried from the preset source database to the service node corresponding to the preset engine for storage.
In step S202, when it is detected that the database corresponding to the data source is the preset source database, the query data is synchronized from the preset source database to the service node corresponding to the preset engine for storage, where the preset engine is a distributed cluster search engine and includes a plurality of service nodes.
In this embodiment, when it is detected that the database corresponding to the data source is the preset source database, the data to be queried is synchronized from the preset source database to the service node corresponding to the preset engine for storage, the preset engine is a distributed cluster and includes a plurality of service nodes, and the preset engine is different from other databases, so that quick search of information can be achieved. The preset engine can store, search and analyze a large amount of data in a short time, and thus is mostly used for a complex search scenario. Synchronizing the preset source database to a service node corresponding to the preset engine to store the data.
It should be noted that the preset engine cluster mode uses a master-slave model, and the cluster does not need to rely on external components (such as Zookeeper, HDFS, and the like) of the task. The master-slave mode can simplify the system design, and the master node is responsible for maintaining cluster information and confirming whether cluster members are still in the cluster system through a heartbeat mechanism. If the main node is down, election is needed, and clusters generally adopt odd number of nodes, so that split brains can be avoided when the main node is elected. Therefore, the preset engine cluster at least has a main node and a data node, and if the cluster node resources are sufficient, the redundant node can be set as a coordination node. In order to ensure the availability in a distributed environment and enlarge the storage space, the preset engine can segment data, each segment can store a plurality of copies according to the initial index setting, the total segment number is equal to the product of the node number and the segment number, and then each segment is uniformly distributed in each cluster node.
Optionally, when it is detected that the database corresponding to the data source is the preset source database, synchronizing the data to be queried from the preset source database to the service node corresponding to the preset engine for storage, where the synchronizing includes:
creating a mapping table of a service node corresponding to a preset source database and a preset engine;
and synchronizing the data to be queried from a preset source database to a service node corresponding to the preset engine for storage according to the mapping table.
In this embodiment, a mapping table of a preset source database and a service node corresponding to a preset engine is created, where the mapping table indicates a service node address, a port name, and corresponding index information and document information of a preset engine cluster, so as to establish a data transmission channel between the preset source database and the preset engine cluster.
It should be noted that, the mapping table may further specify that a primary key field in the source database is mapped to an id identifier in the preset engine cluster, where the primary key field can uniquely identify a field of each row in the mapping table. The id identifier in the preset engine cluster is a character string, and when the character string is combined with _ index (where the document is stored) and _ type (type of object represented by the document), the character string can represent a specific document in the preset engine cluster, that is, the document in the preset engine cluster can be uniquely determined by the id identifier, and different documents have unique id identifiers. And after the creation of the mapping table is completed, a data synchronization channel with a preset engine cluster can be realized.
S203: and according to the data quantity of the data to be queried in the request information, matching the query mode in the preset engine as a target query mode, and exporting the data to be queried from the corresponding service node in the preset engine through the target query mode to generate a report corresponding to the data to be queried.
In step S203, when the data amount of the data to be queried is different for the request information, the querying modes in the preset engine are different, and the querying is performed based on the different querying modes, so that the corresponding data to be queried can be obtained more quickly, and a data report corresponding to the data to be queried is generated.
In this embodiment, after the target query mode is determined, the query request of the data to be queried is written in the JSON format, and then related query is performed, the preset engine provides a Restful API mode to access and use the JSON format, and match query, multi _ match query, term query, range query, pool query, and the like are supported. Where the pre-set engine does not need to sort the data by column and row, but rather stores the entire document. A document is a data entry or a row in a traditional database. It stores these documents and searches for their content by building an index. It creates an inverted index, which is a sorted dictionary of terms that map to documents. The scalability aspect is because the documents can be scattered in different blocks. The preset engine organizes data for each index within a master node that includes a plurality of data nodes. These data nodes help to extend the date across multiple hardware devices, and also, due to the duplicate nodes, the preset engine provides resiliency functions,
it should be noted that, when the query is performed by the preset engine, the query includes a DSL (Domain Specific Language) Language query statement and the like, and may be a partial query statement in the logical query information, and the query function can be executed in an ES (elastic search engine) cluster alone. The query result merging policy may be obtained from the logical query information, where the query result merging policy includes intersecting, merging, and difference sets of the query results, and corresponds to keywords such as MUST, SHOULD, and MUST _ NOT in the logical query information.
When the query is carried out based on the data to be queried, after a JSON format query statement is received, a parser is used for converting a conditional JSON character string in the JSON format query statement into logic query information of a DSL query statement containing an ES query object QueryBuilder, and when the query is executed, the DSL query statement is converted into physical query information of a plurality of DSL query statements. The query building objects comprise a Boolean query building object, a value query building object, a range query building object, an IN query building object and the like.
MAP (mapping) processing is carried out on the JSON format query statement by using an analyzer, and element-value pairs and the logical relation between the element-value pairs in the JSON format query statement are obtained; and taking the first element as a ROOT element, and generating a query construction object of the ES system according to the element-value pair and the logical relation. And judging whether the logical index name exists or not, and if so, constructing a flexible alias Boolean query construction object with the logical index information according to the logical relation. The logical relationship includes one or more of an and, or and a non-logical relationship.
It should be noted that, when querying the index, an index manager may be configured, where the index manager is configured to configure a cluster policy, a partition policy, a configuration association (JOIN) field name, a configuration logical index name alias, and the like. For example, the cluster policy: if the data is divided by date, the indexes of odd days are in the cluster A, the indexes of even days are in the cluster B, and the partition strategy is as follows: if an index is generated every week, only the index in 8 weeks is queried, and the association (JOIN) field name is configured: for example, the pin field of index A is associated with the user _ pin field of index B. The fields attribute is an array, typically with the first field as an association attribute and the remaining fields as additional query attributes.
Optionally, matching the query mode in the preset engine as a target query mode according to the data amount of the data to be queried of the request information, deriving the data to be queried from the corresponding service node in the preset engine through the target query mode, and generating a report corresponding to the data to be queried, where the method includes:
when the data volume of the data to be queried is smaller than a preset threshold value, exporting the data to be queried by utilizing a paging query mode in a preset engine to generate a report corresponding to the data to be queried;
and when the data volume of the data to be queried is larger than a preset threshold value, exporting the data to be queried by utilizing a cursor query mode in a preset engine, and generating a report corresponding to the data to be queried.
In this embodiment, the query manner in the preset engine includes a paging query manner and a cursor query manner, when the data amount of the data to be queried is smaller than the preset threshold, the paging query manner in the preset engine is used to export the data to be queried, so as to generate a report corresponding to the data to be queried, and when the data amount of the data to be queried is greater than the preset threshold, the cursor query manner in the preset engine is used to export the data to be queried, so as to generate a report corresponding to the data to be queried.
Optionally, when the data amount of the data to be queried is smaller than a preset threshold, exporting the data to be queried by using a paging query mode in a preset engine, and generating a report corresponding to the data to be queried, including:
according to a received paging query command of data to be queried, acquiring the starting line number of paging queries and the data volume of each page in the paging queries to obtain a query result of the paging queries;
and sequentially exporting the data to be queried according to the query result of the paging query, and generating a report corresponding to the data to be queried.
In this embodiment, according to a received paging query command of data to be queried, a starting line number of paging query and a data amount in each page of paging query are obtained, so as to obtain a query result of paging query. The initial values of the two parameters are generally transmitted from the front end.
It should be noted that, the preset engine may record the starting line number of the paging query and the data amount in each page of the paging query, and when the paging query is executed, determine and record the starting line number of the paging query and the data amount in each page of the paging query, and obtain the starting line number of the paging query and the data amount in each page of the paging query from the record data before the next paging query is performed.
It should be noted that, the line number of the initial line is generally 0 or 1, for example, for the MySQL database, the line number subscript counts from 0, so the line number of the initial line is 0;
optionally, when the data amount of the data to be queried is greater than a preset threshold, exporting the data to be queried by using a cursor query mode in a preset engine, and generating a report corresponding to the data to be queried, including:
writing the data to be queried into a preset file by a data volume of the export threshold value each time by utilizing a vernier query mode in a preset engine according to a preset export threshold value to obtain a written file;
and uploading the written file to a file container platform, generating a report link corresponding to the data to be queried, and generating a report corresponding to the data to be queried based on a downloading result of the report link.
In this embodiment, according to a preset export threshold, the data to be queried is written into the preset file in the data amount of the export threshold each time by using a cursor query mode in the preset engine, so as to obtain a written file, and when the data is queried through the cursor, the FETCH reads the data from the preset engine each time, and one or N pieces of data can be read at a time. When 1 bar is read, the offset of the cursor is correspondingly increased by 1; when reading N strips, the cursor offset is correspondingly increased by N. For example, the FETCH statement reads data from the preset engine, and when one piece of data is read, the offset of the cursor is correspondingly increased by 1 until the derivation threshold is read, the flag of 1 exists in the cursor, and the reading of all the data to be queried is not completed, a suspended query environment is found, the query statement is continuously executed, the next batch of data to be queried is obtained, and the FETCH statement is retried. Since the cursor registers the cache address of nsql _ ctx, i.e. the entry of the SQL query statement in the virtual machine, the fast query can be restarted by the FETCH.
It should be noted that, after reading the query result of the data to be queried, if the offset is equal to the derivation threshold, the reading of the data to be queried is ended, where the derivation threshold may be the total amount of the data to be queried that needs to be read, and the derivation threshold may be set by a user.
In this embodiment, after the cursor reads the data to be queried according to the index, the offset of the cursor is compared with the derivation threshold, if the offset of the cursor is equal to the derivation threshold of the data to be queried, the cursor can read a sufficient number of data to be queried, and at this time, the reading of the data to be queried can be ended, if the offset of the cursor is smaller than the derivation threshold of the data to be queried, the cursor has not read a sufficient number of data to be queried, and flag is 1, the cursor can continue to read the data to be queried according to the registered index, if the offset of the cursor is smaller than the derivation threshold of the data to be queried, the cursor has not read a sufficient number of data to be queried, and flag is 0, the cursor can not obtain new data to be queried any more, and then the reading of the data is ended after the cursor has read all the data to be queried.
When the query result corresponding to the data to be queried is queried in a cursor query mode, the query result is written into a preset file to obtain a written file, the written file is uploaded to a file container platform to generate a report link corresponding to the data to be queried, and a report corresponding to the data to be queried is generated based on a downloading result of the report link. The file container refers to a container for storing data, such as a local synchronization disk and a cloud server, which performs data synchronization with a service node of a preset engine based on a network.
In another embodiment, when real-time data needs to be derived, the corresponding acquisition frequency may be set in advance, for example, when acquiring every 10 minutes is set, if the time for one acquisition is 11: 10 to 11: 20, and if the time for the next acquisition is 11: 30, the data for 11: 20 to 11: 30 needs to be acquired. During acquisition, firstly, data generated from point 11 to point 20 to point 11 to point 30 are sorted, and during sorting, the data are sorted according to a positive sequence, namely, the data are sorted according to the data generation time, corresponding data are respectively acquired, and the acquired data are synchronized into a service node of a preset engine.
It should be noted that, when data is acquired based on the positive sequence, if there is data acquisition failure, the data that has been acquired failure may be marked, and when data is acquired next time, data may be directly acquired again at the time of the last acquisition failure, so as to prevent repeated data acquisition. For example, when acquiring 10 points and 12 points of data fails. This failure time point, 10 points 12 points, was recorded. The next task is acquired from 10 o' clock 12 to the next task time.
The method comprises the steps of obtaining a data source of data to be inquired of request information according to request information generated by a received report, detecting whether a database corresponding to the data source is a preset source database, synchronizing the data to be inquired from the preset source database to a service node corresponding to a preset engine to be stored when the database corresponding to the data source is detected to be the preset source database, matching an inquiry mode in the preset engine as a target inquiry mode according to the data quantity of the data to be inquired of the request information, leading out the data to be inquired from the service node corresponding to the preset engine through the target inquiry mode, generating a report corresponding to the data to be inquired, synchronizing the data to be inquired to the preset engine, improving inquiry of mass data, and improving generation efficiency of the report corresponding to the mass data.
Referring to fig. 3, which is a schematic flow chart of a data report generation method according to an embodiment of the present invention, as shown in fig. 3, the data report generation method may include the following steps:
s301: acquiring a data source of data to be queried of the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
the content of the step S301 is the same as that of the step S201, and reference may be made to the description of the step S201, which is not repeated herein.
S302: when detecting that the database corresponding to the data source is not the database stored in the preset source database, synchronizing the data to be inquired to the local database;
s303: synchronizing a database corresponding to the data to be queried from a local database to a preset source database for storage through a scheduling platform;
in this embodiment, when it is detected that the database corresponding to the data source is not the preset source database, the data to be queried is synchronized to the local database, and the database corresponding to the data to be queried is synchronized from the local database to the preset source database for storage through the scheduling platform. When the data to be queried is synchronized to the local database, the data to be queried is labeled with a synchronous label, and when the synchronous data is stored, whether the label of the data to be queried is a failure label needs to be considered. Specifically, first, the synchronous data is stored in the local database, and then, it is determined whether the tag of the stored synchronous data is a failure tag, if not, the storage is finished, and if so, after the storage is finished, information can be transmitted to the user of the data to be queried.
For example, when the data to be queried needs to be synchronized to the local database, the corresponding data to be queried is extracted from the server and synchronized to the local database, if an abnormal condition exists (when network communication is abnormal), the local database is not completely synchronized to the local database, a synchronization failure tag can be marked on the data to be queried, and when the data to be queried is completely synchronized to the local database, the failure tag can be modified into a success tag.
Synchronizing data in a local database to a preset source database, wherein during synchronization, the mapping relationship between the source database and the data source is preset, the preset source database corresponds to the data sources one by one, each data source has only one corresponding preset source database, and each preset source database only receives data synchronized with one data source. In this embodiment, the number of the preset source databases in the server is greater than the number of the data sources, so that when the server acquires another data source, the mapping relationship between the acquired data source and the increased preset source database is directly established. The source database may be named for the corresponding target preset by the ID of the hardware device of the data source.
And when the data are synchronized, loading the data in the target preset source database to the preset engine service node respectively. The multiple sub-databases in the target preset source database are respectively in corresponding relation with the data source, and when the data in the target HIVE database are synchronized, the data in the target HIVE database are synchronized to the preset engine service node, and the data in each target preset source database have certain regularity, so that the time for searching the data in data synchronization is shortened, and the data are synchronized to the preset engine service node very quickly.
It should be noted that, since the data to be queried may have different sources, different data formats, or different typesetting styles, after the target data is selected, the data to be queried is preferably preprocessed, so that the data to be queried conforms to the preset rule. The preprocessing of the data to be queried may be to convert the data to be queried into data in a specified data format. The preprocessing of the data line to be queried may further specifically include the following steps: reading the target data line by line, and judging whether each line of data meets a preset rule; when the line data which do not accord with the preset rule are judged to be contained, outputting data editing prompt information; and receiving the editing processing of the line data which do not accord with the preset rule, and repeating the judging step until each line of data accords with the preset rule. The preset rules can be formulated according to actual use requirements, such as data format requirements, rules followed by each row of data (each row contains several columns, each column is numbers or texts, and the like).
It should be noted that the preset source database of the present embodiment is a database based on a preset source database model, and processes data in the database by using concepts and methods such as set algebra, and is also organized into a set of tables with formal descriptive properties, and the essence of the table function is to load a special set of data items, and the data in these tables can be accessed or recalled in many different ways without reorganizing the database tables, and each table contains one or more data types represented by rows or columns. The table structure of the database table in this embodiment includes: table name, library name, which fields are included in the table, description information for each field, etc. The Sqoop is a source-opening tool, is mainly used for data transmission between Hadoop and a traditional database (MySQL, postgresql., etc.), and can lead data in a relational database (such as MySQL, oracle, postgres, etc.) into the HDFS of Hadoop and can also lead data of the HDFS into the relational database. All preset source databases are subjected to standard arrangement by using the same Sqoop script, a pre-configured table structure is configured in each target preset source database, a server reads data in the target preset source databases, the data in each sub-database is configured in the table structure according to a preset script instruction, and the data in the sub-databases are defined and marked, so that the subsequent synchronization to a preset engine service node is facilitated, and the query and the call are facilitated. The standard finishing method comprises the following steps: storing the data according to the same sequence rule or adding different marks according to different data types.
S304: and according to the data quantity of the data to be queried in the request information, matching the query mode in the preset engine as a target query mode, and exporting the data to be queried from the service node corresponding to the preset engine through the target query mode to generate a report corresponding to the data to be queried.
The content of the step S304 is the same as that of the step S203, and reference may be made to the description of the step S203, which is not repeated herein.
Referring to fig. 4, which is a schematic flow chart of a data report generation method according to an embodiment of the present invention, as shown in fig. 4, the data report generation method may include the following steps:
s401: acquiring a data source of data to be queried of the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
s402: when detecting that a database corresponding to a data source is a preset source database, synchronizing data to be queried from the preset source database to a service node corresponding to a preset engine for storage;
the contents of the steps S401 to S402 are the same as the contents of the steps S201 to S202, and the descriptions of the steps S201 to S202 can be referred to, which are not repeated herein.
S403: detecting whether synchronous data to be inquired is abnormal in a service node corresponding to a preset engine;
s404: and when the synchronous data to be queried is abnormal, deleting the abnormal data in the data to be queried according to a preset instruction, and storing the normal data to be queried in a service node corresponding to a preset engine.
In this embodiment, whether the synchronized data to be queried is abnormal is detected in the service node corresponding to the preset engine, when it is detected that the synchronized data to be queried is abnormal, the synchronized data to be queried in the preset engine may be compared with the synchronized data to be queried in the preset source database, when the comparison result is different, it is considered that the abnormal data exists in the preset engine, the corresponding abnormal data is deleted, and when the abnormal data exists, the abnormal data is deleted according to a preset deletion instruction.
S405: and according to the data quantity of the data to be queried in the request information, matching the query mode in the preset engine as a target query mode, and exporting the data to be queried from the service node corresponding to the preset engine through the target query mode to generate a report corresponding to the data to be queried.
The content of the step S405 is the same as that of the step S203, and reference may be made to the description of the step S203, which is not repeated herein.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a data report generating device according to an embodiment of the present invention. The computer device in this embodiment includes units for executing the steps in the embodiments corresponding to fig. 2 to 4. Please refer to fig. 2 to 4 and fig. 2 to 4 for the corresponding embodiments. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 5, data table generation 50 includes: the system comprises a detection module 51, a synchronization module 52 and a report generation module 53.
The detecting module 51 is configured to obtain a data source of data to be queried according to the request information generated by the received report, and detect whether a database corresponding to the data source is a preset source database.
The synchronization module 52 is configured to synchronize, from the preset source database, the data to be queried to the service node corresponding to the preset engine for storage when it is detected that the database corresponding to the data source is the preset source database.
And the report generation module 53 is configured to match the query mode in the preset engine as a target query mode according to the data amount of the data to be queried in the request information, export the data to be queried from the service node corresponding to the preset engine through the target query mode, and generate a report corresponding to the data to be queried.
Optionally, the synchronization module 52 includes:
and the mapping table creating unit is used for creating a mapping table of a service node corresponding to the preset source database and the preset engine.
And the storage unit is used for storing the data to be inquired from the service node corresponding to the synchronous preset engine in the preset source database according to the mapping table.
Optionally, the report generating module 53 includes:
and the paging query unit is used for exporting the data to be queried by using a paging query mode in a preset engine when the data volume of the data to be queried is less than a preset threshold value, and generating a report corresponding to the data to be queried.
And the vernier query unit is used for exporting the data to be queried by utilizing a vernier query mode in a preset engine when the data quantity of the data to be queried is larger than a preset threshold value, and generating a report corresponding to the data to be queried.
Optionally, the paging query unit includes:
the obtaining subunit is configured to obtain, according to a received paging query command of data to be queried, a starting line number of paging query and a data amount in each page of the paging query, to obtain a query result of the paging query;
and the exporting subunit is used for sequentially exporting the data to be queried according to the query result of the paging query to generate a report corresponding to the data to be queried.
Optionally, the cursor query unit includes:
the writing subunit is used for writing the data to be queried into a preset file in a data quantity of the export threshold value each time by utilizing a vernier query mode in a preset engine according to the preset export threshold value to obtain a written file;
and the uploading subunit is used for uploading the written file to the file container platform, generating a report link corresponding to the data to be inquired, and generating a report corresponding to the data to be inquired based on a downloading result of the report link.
Optionally, the generating device further includes:
and the local database synchronization module is used for synchronizing the data to be queried to the local database when detecting that the database corresponding to the data source is not the preset source database.
And the scheduling module is used for synchronizing the database corresponding to the data to be queried from the local database to a preset source database for storage through the scheduling platform.
Optionally, the generating device further includes:
and the detection module is used for detecting whether the synchronous data to be inquired is abnormal in the service node corresponding to the preset engine.
And the deleting module is used for deleting abnormal data in the data to be queried according to a preset instruction when the synchronous data to be queried is abnormal, and storing normal data to be queried in a service node corresponding to the preset engine.
It should be noted that, because the contents of information interaction, execution process, and the like between the above units are based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be specifically referred to a part of the method embodiment, and details thereof are not described herein.
Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 6, the computer apparatus of this embodiment includes: at least one processor (only one shown in fig. 6), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor when executing the computer program implementing the steps in any of the various data table generation method embodiments described above.
The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 6 is merely an example of a computer device and is not intended to be limiting, and that a computer device may include more or fewer components than those shown, or some components may be combined, or different components may be included, such as a network interface, a display screen, and input devices, etc.
The Processor may be a CPU, and the Processor may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory includes readable storage media, internal memory, etc., wherein the internal memory may be the internal memory of the computer device, and the internal memory provides an environment for the operating system and the execution of the computer-readable instructions in the readable storage media. The readable storage medium may be a hard disk of the computer device, and in other embodiments may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Further, the memory may also include both internal and external storage units of the computer device. The memory is used for storing an operating system, application programs, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program, and the like. The memory may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the above-mentioned apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method of the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
The present invention can also be implemented by a computer program product, which when executed on a computer device causes the computer device to implement all or part of the processes in the method of the above embodiments.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims (10)

1. A method for generating a data report, the method comprising:
acquiring a data source of data to be queried of the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
when detecting that the database corresponding to the data source is the preset source database, synchronizing the data to be queried from the preset source database to a service node corresponding to a preset engine for storage;
and matching the query mode in the preset engine as a target query mode according to the data volume of the data to be queried in the request information, exporting the data to be queried from the service node corresponding to the preset engine through the target query mode, and generating a report corresponding to the data to be queried.
2. The data report generating method of claim 1, wherein the step of obtaining the data source of the data to be queried according to the request information generated according to the received report, and detecting whether the database corresponding to the data source is a preset source database further comprises:
when detecting that the database corresponding to the data source is not the preset source database, synchronizing the data to be queried to a local database;
and synchronizing the database corresponding to the data to be inquired from the local database to the preset source database for storage through a scheduling platform.
3. The method for generating a data report according to claim 1, wherein synchronizing the data to be queried from the source database to the service node corresponding to the default engine when detecting that the database corresponding to the data source is the source database comprises:
creating a mapping table of the preset source database and a service node corresponding to the preset engine;
and synchronizing the data to be inquired from the preset source database to a service node corresponding to the preset engine for storage according to the mapping table.
4. The method for generating a data report according to claim 1, wherein after the step of synchronizing the data to be queried from the predetermined source database to the service node corresponding to the predetermined engine for storage when the database corresponding to the data source is detected as the predetermined source database, the method further comprises:
detecting whether the synchronous data to be inquired is abnormal in a service node corresponding to the preset engine;
and when the synchronized data to be queried is abnormal, deleting abnormal data in the data to be queried according to a preset instruction, and storing normal data to be queried in a service node corresponding to the preset engine.
5. The data report generating method of claim 1, wherein the matching, according to the data size of the data to be queried in the request information, the query mode in the preset engine is used as a target query mode, and the deriving, by the target query mode, the data to be queried from the service node corresponding to the preset engine generates the report corresponding to the data to be queried, including:
when the data volume of the data to be queried is smaller than a preset threshold value, exporting the data to be queried by utilizing a paging query mode in the preset engine, and generating a report corresponding to the data to be queried;
and when the data volume of the data to be queried is larger than a preset threshold value, exporting the data to be queried by utilizing a cursor query mode in the preset engine, and generating a report corresponding to the data to be queried.
6. The data report generating method of claim 5, wherein when the data amount of the data to be queried is smaller than a preset threshold, the data to be queried is exported by using a paging query mode in the preset engine, and a report corresponding to the data to be queried is generated, including:
according to the received paging query command of the data to be queried, acquiring the starting line number of paging query and the data volume of each page in the paging query to obtain a query result of the paging query;
and sequentially exporting the data to be queried according to the query result of the paging query, and generating a report corresponding to the data to be queried.
7. The data report generating method of claim 5, wherein when the data amount of the data to be queried is greater than a preset threshold, the data to be queried is exported by using a cursor query mode in the preset engine, and a report corresponding to the data to be queried is generated, including:
writing the data to be queried into a preset file by using a vernier query mode in the preset engine according to a preset export threshold value and the data volume of the export threshold value each time to obtain a written file;
and uploading the written file to a file container platform, generating a report link corresponding to the data to be inquired, and generating a report corresponding to the data to be inquired based on a downloading result of the report link.
8. An apparatus for generating a datagram, the apparatus comprising:
the detection module is used for acquiring a data source of data to be queried of the request information according to the request information generated by the received report, and detecting whether a database corresponding to the data source is a preset source database;
a synchronization module, configured to synchronize, when it is detected that the database corresponding to the data source is the preset source database, the data to be queried from the preset source database to a service node corresponding to the preset engine for storage;
and the report generation module is used for matching the query mode in the preset engine as a target query mode according to the data volume of the data to be queried in the request information, deriving the data to be queried from the service node corresponding to the preset engine through the target query mode, and generating a report corresponding to the data to be queried.
9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing a data table generation method as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a data table generating method according to any one of claims 1 to 7.
CN202211114029.3A 2022-09-14 2022-09-14 Data report generation method and device, computer equipment and storage medium Pending CN115357628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211114029.3A CN115357628A (en) 2022-09-14 2022-09-14 Data report generation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211114029.3A CN115357628A (en) 2022-09-14 2022-09-14 Data report generation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115357628A true CN115357628A (en) 2022-11-18

Family

ID=84006172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211114029.3A Pending CN115357628A (en) 2022-09-14 2022-09-14 Data report generation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115357628A (en)

Similar Documents

Publication Publication Date Title
US11475034B2 (en) Schemaless to relational representation conversion
CN111046034B (en) Method and system for managing memory data and maintaining data in memory
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
CN111767303A (en) Data query method and device, server and readable storage medium
CN111324610A (en) Data synchronization method and device
CN111949541A (en) Multi-source database statement checking method and device
US20120096054A1 (en) Reading rows from memory prior to reading rows from secondary storage
CN110134681B (en) Data storage and query method and device, computer equipment and storage medium
CN113051268A (en) Data query method, data query device, electronic equipment and storage medium
EP3690669A1 (en) Method, apparatus, device and storage medium for managing index technical field
US11514697B2 (en) Probabilistic text index for semi-structured data in columnar analytics storage formats
CN112434015A (en) Data storage method and device, electronic equipment and medium
CN116483850A (en) Data processing method, device, equipment and medium
CN113722600A (en) Data query method, device, equipment and product applied to big data
WO2023197865A1 (en) Information storage method and apparatus
CN111125216A (en) Method and device for importing data into Phoenix
RU2393536C2 (en) Method of unified semantic processing of information, which provides for, within limits of single formal model, presentation, control of semantic accuracy, search and identification of objects description
CN111259003B (en) Database establishment method and device
CN115357628A (en) Data report generation method and device, computer equipment and storage medium
US20170031909A1 (en) Locality-sensitive hashing for algebraic expressions
CN113868138A (en) Method, system, equipment and storage medium for acquiring test data
CN111221846B (en) Automatic translation method and device for SQL sentences
CN111159218B (en) Data processing method, device and readable storage medium
CN115840786B (en) Data lake data synchronization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination