CN111159219A - Data management method, device, server and storage medium - Google Patents

Data management method, device, server and storage medium Download PDF

Info

Publication number
CN111159219A
CN111159219A CN201911410067.1A CN201911410067A CN111159219A CN 111159219 A CN111159219 A CN 111159219A CN 201911410067 A CN201911410067 A CN 201911410067A CN 111159219 A CN111159219 A CN 111159219A
Authority
CN
China
Prior art keywords
data
sqlite
data query
data file
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911410067.1A
Other languages
Chinese (zh)
Other versions
CN111159219B (en
Inventor
林敏�
叶必胜
周小敏
王全胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Asiainfo Software Co ltd
Original Assignee
Hunan Asiainfo Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Asiainfo Software Co ltd filed Critical Hunan Asiainfo Software Co ltd
Priority to CN201911410067.1A priority Critical patent/CN111159219B/en
Publication of CN111159219A publication Critical patent/CN111159219A/en
Application granted granted Critical
Publication of CN111159219B publication Critical patent/CN111159219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a data management method, a device, a server and a storage medium, wherein a data query request is received, and the data query request carries a data query condition; obtaining target SQLite data files related to data query conditions from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform; and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result. Based on the invention, mass data storage can be realized based on the distributed storage system, and the SQLite data file in the distributed storage system can be inquired to realize SQL standard inquiry and improve the quick response of data inquiry, without improving the database, thereby reducing the use threshold.

Description

Data management method, device, server and storage medium
Technical Field
The present invention relates to the field of big data management technologies, and in particular, to a data management method, an apparatus, a server, and a storage medium.
Background
At present, the data volume accessed by a large data center is more and more, and the application is more and more extensive. In some data application scenarios, it is required to be able to implement both storage of mass data and fast response to data query. Hadoop can store mass data, but has slow response to data query, especially to query with small data volume. Although the relational database can realize quick response to data query based on SQL, its data storage capacity is limited. Although the Nosql can realize the storage of mass data, certain improvement is needed to realize the SQL query so as to improve the response speed of the data query, and the use threshold is high.
Disclosure of Invention
In view of this, the present application provides a data management method, apparatus, server and storage medium, so as to achieve fast response to mass data storage and data query on the basis of reducing the usage threshold.
The technical scheme is as follows:
the first aspect of the present invention discloses a data management method, including:
receiving a data query request, wherein the data query request carries data query conditions;
obtaining target SQLite data files related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform;
and inquiring information matched with the data inquiry condition from each target SQLite data file as a data inquiry result.
Optionally, the obtaining target SQLite data files related to the data query condition from the SQLite data files stored in the distributed storage system includes:
acquiring a table name and an account period in the data query condition;
performing hash calculation on the table name and the account period to generate second information;
the method comprises the steps of inquiring an SQLite data file which carries first information and is the same as second information from SQLite data files stored in a distributed storage system, and determining the inquired SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, converting each data file into an SQLite data file, performing hash calculation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
Optionally, the obtaining target SQLite data files related to the data query condition from the SQLite data files stored in the distributed storage system includes:
acquiring a table name in the data query condition;
performing hash calculation on the table name to generate second information;
the method comprises the steps of inquiring an SQLite data file which carries first information and is the same as second information from SQLite data files stored in a distributed storage system, and determining the inquired SQLite data file as a target SQLite data file;
the computing platform is used for dividing the total amount of original data into data files, converting each data file into an SQLite data file, performing hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
Optionally, the receiving a data query request includes: and under the condition that the historical data query result of the data query request is not stored in the cache, receiving the data query request sent through the data access interface according to the data access interface specification.
Optionally, the method further includes:
and taking the historical data query result as a data query result under the condition that the historical data query result of the data query request is stored in the cache.
Optionally, the querying, from each target SQLite data file, information matched with the data query condition as a data query result includes:
loading each target SQLite data file to a memory;
performing aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query condition;
and taking the information as a data query result of the data query request.
Optionally, the distributed storage system is an HBase database based on a Hadoop platform.
A second aspect of the present invention discloses a data management apparatus, including:
the device comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a data query request which carries data query conditions;
the system comprises a first acquisition unit, a second acquisition unit and a data query unit, wherein the first acquisition unit is used for acquiring target SQLite data files related to the data query condition from SQLite data files stored in a distributed storage system, and the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform;
and the first query unit is used for querying information matched with the data query conditions from each target SQLite data file as a data query result.
A third aspect of the present invention discloses a server, comprising: at least one memory and at least one processor; the memory stores a program, and the processor calls the program stored in the memory, wherein the program is used for realizing the data management method disclosed in any one of the first aspect of the invention.
A fourth aspect of the present invention discloses a computer-readable storage medium having stored thereon computer-executable instructions for performing the data management method as disclosed in any one of the first aspects of the present invention.
The invention provides a data management method, a data management device, a server and a storage medium, which are used for receiving a data query request, acquiring target SQLite data files related to data query conditions from SQLite data files stored in a distributed storage system, and querying information matched with the data query conditions from the target SQLite data files as data query results. The technical method provided by the invention can realize mass data storage based on the distributed storage system, and can realize standard query of SQL and improve quick response of data query by querying the SQLite data file in the distributed storage system without improving the database, thereby reducing the use threshold.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a data management system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a data management method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a data management method according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a method for storing an SQLite data file in a distributed storage system according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a method for querying information matching the data query condition from each target SQLite data file as a data query result according to an embodiment of the present invention;
FIG. 6 is an exemplary diagram of another data management method provided by an embodiment of the invention;
fig. 7 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention;
fig. 8 is a hardware block diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, the data volume accessed by the large data center is more and more, and the application is more and more extensive. In some data application scenarios, it is required to be able to implement both storage of mass data and fast response to data query. At present, a method for storing mass data includes calculating and summarizing data through Hadoop to obtain original data, and storing the original data in a relational database or transferring the original data to Nosql, so as to realize data query through the relational database or the Nosql when data query is performed.
When the data volume of the original data is large, the data storage capacity of the relational database is limited, so that the massive data cannot be stored, and when a user stores and queries the same original data, the table locking condition is caused, so that a production fault is caused. Although the Nosql can realize the storage of mass data, the Nosql does not support SQL query by default, when the SQL query is required, a corresponding big data component needs to be superimposed in the Nosql, and the Nosql has certain limitation on query conditions and query scenes, and the query scenes need to be planned in advance, so that the corresponding query conditions are formulated according to the query scenes to realize the SQL query, and the Nosql has high requirements on the design capability of technical personnel and high use threshold.
The Hadoop can further realize the storage and query of the original data on the basis of obtaining the original data by calculating the data, but the Hadoop has slow response to the data query, and particularly when the data volume of the original data is small, the Hadoop has large early preparation work, namely long starting time, so that the Hadoop has slow response to the query of the small data volume.
Therefore, the invention provides a data management method, a device, a server and a storage medium, which realize mass data storage based on a distributed storage system, and realize SQL query by querying SQLite data files in the distributed storage system so as to improve the quick response of data query, do not need to improve a database per se, and reduce the use threshold.
Referring to fig. 1, a schematic structural diagram of a data management system according to an embodiment of the present invention is shown. The data management system comprises a server, a data access interface, data access middleware and an HBase database. The SQLite data file is stored in the HBase database.
Referring to fig. 2 in conjunction with fig. 1, a schematic diagram of a data management method provided by an embodiment of the present invention is shown, as shown in fig. 2, specifically:
the server receives a data query request sent by a user based on the front-end application, judges whether a historical data query result of the data query request is stored in a cache or not, and if the historical data query result is stored in the cache, returns the historical data query result serving as the data query result of the data query request to the user; if the data query request does not exist, the data access interface is called according to the data access interface specification to send the data query request to the data access middleware.
The data access middleware analyzes the data query request based on an SQL access engine to obtain a data query condition; acquiring target SQLite data files related to the data query condition from an HBase database according to the data query condition; loading each target SQLite data file into a memory of the target SQLite data file; and merging all target SQLite data files in the memory, performing aggregation calculation on all the merged target SQLite data files to obtain information matched with the data query conditions, and further taking the obtained information as a data query result of the data query request. And returning the data query result to the server based on the data access interface, so that the server returns the data query result to the front-end application for the user to view.
It should be noted that the HBaes database is a distributed storage system based on a Hadoop platform. The front-end application may be a reporting Web application.
In the embodiment of the application, the server returns the data query result of the data query request to the front-end application, and simultaneously, the data query result of the data query request can be stored in the cache, and after the data query result of the data query request is stored in the cache, the data query result can be called as a historical data query result of the data query request. Therefore, when the data query request is received, whether the historical data query result of the data query request is stored in the cache can be judged; if the historical data query result of the data query request is stored in the cache, returning the historical data query result to the user as the data query result of the data query request; if the historical data query result of the data query request is not stored in the cache, calling a data access interface according to the data access interface specification to send the data query request to a data access middleware, and further processing the data query request through an SQL access engine in the middle of data access.
For example, the cache stores the historical data query result of the data query request 1, the historical data query result of the data query request 2, and the historical data query result of the data query request 3. If the received data query request is a data query request 1, determining that the historical data query result of the data query request is stored in the cache, and further taking the historical data query result of the data query request 1 as the data query result of the received data query request.
On the contrary, if the received data query request is the data query request 4, and it is determined that the historical data query result of the data query request is not stored in the cache, the data access interface is called according to the data access interface specification to send the data query request to the data access middleware, and the data query request is further processed through an SQL access engine in the middle of data access.
It should be noted that, when the storage duration of the historical data query result stored in the cache reaches the preset duration, the historical data query result whose storage duration reaches the preset duration is deleted from the cache.
In the embodiment of the present application, the preset time period may be one day, two days, and the like, the above is only a preferred value of the preset time period provided in the embodiment of the present application, and the inventor may set the specific value of the preset time period according to his own requirement, which is not limited herein.
In the embodiment of the invention, when a data query request is received, whether a historical data query result of the data query request is stored in a cache is judged; if the historical data query result of the data query request is stored in the cache, taking the historical data query result as the data query result of the data query request; and if the historical data query result of the data query request is not stored in the cache, acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system, and further querying information matched with the data query condition from each target SQLite data file to serve as the data query result. According to the data query method and device, under the condition that the historical data query result of the data query request is stored in the cache, the data query request is not processed, but the historical data query result is directly used as the data query result of the data query request, and therefore the response efficiency of data query is improved. Under the condition that historical data query results of data query requests are not stored in the cache, mass data storage is achieved based on the distributed storage system, the SQL access engine is used for querying the SQLite data files in the distributed storage system, SQL standard query can be achieved, quick response of data query is improved, improvement on the database is not needed, and the use threshold is lowered.
A detailed description will now be given, with reference to fig. 3, of a data management method provided in an embodiment of the present invention, where the data management method is applied to data access middleware, and is specifically applied to an SQL access engine in the data access middleware. As shown in fig. 3, the data management method specifically includes the following steps:
s301: receiving a data query request, wherein the data query request carries data query conditions;
in the embodiment of the application, the data query request sent through the data access interface according to the data access interface specification is received under the condition that the historical data query result of the data query request is not stored in the cache. The cache may be a data cache region.
In this embodiment of the present application, the manner of determining whether the cache stores the historical data query result of the data query request may be: a user sends a data query request to a server based on a front-end application, and the server judges whether historical data query results of the data query request are stored in a cache after receiving the data query request; if the historical data query result of the data query request is stored in the cache, sending the historical data query result as a data query result; and if the historical data query result of the data query request is not stored in the cache, calling the data access interface according to the data access interface specification to send the data query request to the data access middleware.
S302: acquiring target SQLite data files related to data query request conditions from the SQLite data files stored in the distributed storage system;
in the specific execution process of the step S302, the SQLite data file is stored in the distributed storage system, and when a data query request is received, the data query request is analyzed based on the SQL access engine to obtain a data query condition; and then acquiring the target SQLite data files related to the data query condition from the distributed storage system according to the data query condition.
In the embodiment of the application, the SQLite data file in the distributed storage system is stored in the distributed storage system after the computing platform converts the original data into the SQLite data file.
As a preferred mode of the embodiment of the present application, as shown in fig. 4, a mode of storing the SQLite data file in the distributed storage system may be: dividing original data into data files with different account periods through a computing platform; converting each data file into an SQLite data file; and for each SQLite data file, carrying out hash calculation by using the table name of the SQLite data file and the account period corresponding to the SQLite data file, taking the result obtained by carrying out hash calculation on the table name and the account period as first information of the SQLite data file, and further storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
In the embodiment of the application, after the computing platform performs summary computation on data to obtain original data, the original data is divided into data files with different account periods according to months. For example, a computing platform may divide raw data for a year by month into data files for the 1 month account, data files for the 2 month account … … 12 month account. The present invention relates to a specific method for dividing original data into data files with different accounting periods through a computing platform, and the inventor can set the data files according to his own needs, which is not limited in the embodiments of the present invention.
In the embodiment of the application, for each data file, the data file can be converted into an SQLite data file, and each converted SQLite data file has a unique table name and an account period; for each SQLite data file, performing hash calculation according to the table name and the account period of the SQLite data file to obtain first information of the SQLite data file. The first information may be a 32-bit or 64-bit character string. And then pre-partitioning according to target information in the first information in an odd-numbered jump mode from 01f to fef, and storing the SQLite data file in the distributed storage system according to the pre-partitioning result so as to balance the distributed storage system. The target information in the first information may be the first 3-bit characters in the first information.
Correspondingly, in the embodiment of the present application, a manner of acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system may be: after the data query request is analyzed based on the SQL access engine to obtain a data query condition, a table name and an account period in the data query condition are obtained; performing hash calculation on the acquired table name and the account period, and taking a result obtained by performing hash calculation on the table name and the account period as first information; and inquiring the SQLite data files identical to the second information from the first information carried by the SQLite data files of the distributed storage system, and determining the inquired SQLite data files as target SQLite data files.
In the embodiment of the application, the result obtained by performing hash calculation on the table name and the account period in the data query condition is used as the first information, the partition of the target SQLite data file stored in the distributed storage system can be quickly determined according to the target information in the first information, and then the target SQLite data file related to the data condition is determined from the determined partition.
For example, the computing platform may divide the raw data into a 1-month account data file and a 2-month account file; converting the accounting period file of the month 1 into an SQLite data file 1; converting the data file of the accounting period of the month 2 into an SQLite data file 2; the table name of the SQLite data file 1 is table name 1, and the table name of the SQLite data file 2 is table name 2. The first information of the SQLite data file 1 generated by carrying out hash calculation by using the table name 1 and the account period of 1 month is k1, and the first information of the SQLite data file 2 generated by carrying out hash calculation by using the table name 2 and the account period of 2 months is k 2. If the table name and the account period in the data query condition are obtained, the table name and the account period are respectively 1 month account period and 1 month account period; and performing hash calculation on the table name 1 and the account period of month 1 to generate second information k1, and further determining that the SQLite data file, in which the first information and the second information carried in the SQLite data file stored in the distributed storage system are the same, is the SQLite data file 1.
As another preferred mode of the embodiment of the present application, as shown in fig. 4, a mode of storing the SQLite data file in the distributed storage system may be: dividing the original data into data files in a full scale through a computing platform; converting each data file into an SQLite data file; and for each SQLite data file, carrying out hash calculation by using the table name of the SQLite data file, taking the result obtained by carrying out hash calculation on the table name as first information of the SQLite data file, and further storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
In this embodiment of the application, for each SQLite data file, the SQLite data file has a unique table name, and the first information of the SQLite data file can be obtained by performing hash calculation according to the table name of the SQLite data file. The first information may be a 32-bit or 64-bit character string. And partitioning according to the target information in the first information in an odd jump mode from 01f to fef, and storing the SQLite data file in the distributed storage system according to the result of partitioning so as to balance the distributed storage system. The target information in the first information may be the first 3-bit characters in the first information.
Correspondingly, in the embodiment of the present application, a manner of acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system may be: after the data query request is analyzed based on the SQL access engine to obtain a data query condition; acquiring a table name in the data query condition, performing hash calculation on the acquired table name, and taking a result obtained by performing hash calculation on the table name as second information; and inquiring the SQLite data files with the second information from the first information carried by each SQLite data file of the distributed storage system, and further determining the inquired SQLite data files as target SQLite data files.
In the embodiment of the application, the result obtained by performing hash calculation on the table name in the data query condition is used as the second information, the partition of the target SQLite data file stored in the distributed storage system can be quickly determined according to the target information in the second information, and then the target SQLite data file related to the data condition is determined from the determined partition.
For example, a computing platform is used for dividing the total amount of original data into a data file 1 and a data file 1, converting the data file 1 into an SQLite data file 1, and converting the data file 2 into an SQLite data file 2; the table name of the SQLite data file 1 is table name 1, and the table name of the SQLite data file 2 is table name 2. The first information of the SQLite data file 1 generated by performing hash calculation using the table name 1 is k1, and the first information of the SQLite data file 2 generated by performing hash calculation using the table name 2 is k 2. If the table name in the data query condition is acquired as the table name 1; and if the second information generated by performing hash calculation on the table name 1 is k1, determining that the SQLite data file in which the first information carried in the SQLite data file stored in the distributed storage system is the same as the second information is the SQLite data file 1.
Preferably, in the embodiment of the present application, the distributed storage system is an HBase database based on a Hadoop platform.
It should be noted that the distributed file system hdsf of the Hadoop platform is applied on the basis of the HBase database, so as to realize distributed storage of files.
S303: and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result.
Fig. 5 is a schematic flowchart of a method for querying information matched with a data query condition from each target SQLite data file as a data query result according to an embodiment of the present invention.
As shown in fig. 5, the method includes:
s501: loading each target SQLite data file to a memory;
in the embodiment of the application, each target SQLite data file matched with the data query condition is acquired from the distributed storage system based on the SQL access engine, and then the acquired target SQLite data files are loaded to the memory.
S502: performing aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query conditions;
in the embodiment of the application, after each target SQLite data file is loaded into the memory, each target SQLite data file in the memory is merged based on the SQL access engine, and then aggregate calculation is performed on each merged target SQLite data file to obtain information matched with the data query conditions.
It should be noted that, merging the target SQLite data files in the memory may be understood as: and (4) putting each target SQLite data file in the memory into the same SQLite data file.
S503: and taking the information as a data query result of the data query request.
Further, in the embodiment of the present application, the SQL access engine of the data access middleware returns the data query result of the data query request to the server based on the data access interface, so that the server returns the data query result of the data query request to the front-end application for the user to view.
The invention provides a data management method, which is applied to a data access middleware, in particular to an SQL access engine of the data access middleware, receives a data query request, acquires each target SQLite data file related to a data query condition from SQLite data files stored in a distributed storage system, and further queries information matched with the data query condition from each target SQLite data file as a data query result. The technical method provided by the invention can realize mass data storage based on the distributed storage system, and the SQL access engine queries the SQLite data file in the distributed storage system to realize standard query of SQL and improve quick response of data query, and the database is not required to be improved, so that the use threshold is reduced.
In order to better understand the content of the data management method provided above, the embodiment of the present invention provides an example of a data management method based on an application in production and use of an index library. As shown in fig. 6, specifically:
the SPARK index calculation platform performs summary calculation on data in the data warehouse to obtain original data, and divides the original data into data files with different account periods; converting each data file into an SQLite data file; and performing CUBE storage, namely performing hash calculation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file, performing hash calculation on the table name and the account period to obtain a result as first information of the SQLite data file, and storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
The server receives an SQL request sent by a user based on front-end application, and queries a historical data query result for judging whether the SQL request is stored in a cache; and if the historical data query result of the SQL request is stored in the cache, returning the historical data query result to the front-end application as the data query result of the SQL request.
If the historical data query result of the SQL request is not stored in the cache, calling a data access interface according to the data access interface specification to send the data query request to an index aggregation middleware, namely a data access middleware; analyzing the SQL request by an SQL access engine based on the index aggregation middleware to obtain a data query condition; acquiring a table name and an account period in the data query condition, performing hash calculation on the acquired table name and the acquired account period, and taking a result obtained by performing hash calculation on the table name and the account period as second information; inquiring an SQLite data file which is the same as the second information from first information carried by each SQLite data file of the distributed storage system, and determining the inquired SQLite data file as a target SQLite data file; loading each target SQLite data file to an internal memory, merging each target SQLite data file in the internal memory, and further performing SQL calculation on each merged target SQLite data file, namely performing aggregation calculation on each merged target SQLite data file to obtain information matched with a data query condition; and returning the information matched with the data query conditions to the server as a data query result based on the data access interface, so that the server returns the data query result to the front-end application for the user to view.
Corresponding to the data management method provided by the embodiment of the present invention, as shown in fig. 7, the embodiment of the present invention further provides a schematic structural diagram of a data management apparatus. The data management apparatus includes:
a receiving unit 71, configured to receive a data query request, where the data query request carries a data query condition;
the first acquisition unit 72 is configured to acquire target SQLite data files related to the data query condition from SQLite data files stored in the distributed storage system, where the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into SQLite data files by the computing platform;
and the first query unit 73 is used for querying information matched with the data query condition from each target SQLite data file as a data query result.
The specific principle and the execution process of each unit in the data management device disclosed in the above embodiment of the present invention are the same as those of the data management method disclosed in the above embodiment of the present invention, and reference may be made to corresponding parts in the data management method disclosed in the above embodiment of the present invention, which are not described herein again.
The invention provides a data management device which receives a data query request, acquires target SQLite data files related to data query conditions from SQLite data files stored in a distributed storage system, and further queries information matched with the data query conditions from the target SQLite data files as data query results. The technical method provided by the invention can realize mass data storage based on the distributed storage system, and the SQL access engine queries the SQLite data file in the distributed storage system to realize SQL standard query and improve the quick response of data query, and the database is not required to be improved, so that the use threshold is reduced.
In this embodiment of the application, preferably, the first obtaining unit includes:
the second acquisition unit is used for acquiring the table name and the account period in the data query condition;
the first calculation unit is used for carrying out Hash calculation on the table name and the account period to generate second information;
the second query unit is used for querying the SQLite data files which carry the same first information as the second information from the SQLite data files stored in the distributed storage system, and determining the queried SQLite data files as target SQLite data files;
the computing platform is used for dividing original data into data files with different account periods, converting each data file into an SQLite data file, performing hash calculation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
In this embodiment of the application, preferably, the first obtaining unit includes:
a third obtaining unit, configured to obtain a table name in the data query condition;
the second calculation unit is used for carrying out Hash calculation on the table name to generate second information;
the third query unit is used for querying the SQLite data file which carries the same first information as the second information from the SQLite data files stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing the total amount of original data into data files, converting each data file into an SQLite data file, performing hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
In this embodiment, preferably, the receiving unit includes:
and the receiving subunit is used for receiving the data query request sent through the data access interface according to the data access interface specification under the condition that the historical data query result of the data query request is not stored in the cache.
Further, an embodiment of the present application provides a data management apparatus, further including:
and the first determining unit is used for taking the historical data query result as the data query result under the condition that the historical data query result of the data query request is stored in the cache.
In the embodiment of the application, under the condition that the historical data query result of the data query request is not stored in the cache, the data query request is not processed, and the historical data query result is directly used as the data query result of the data query request, so that the response efficiency of data query is improved.
In this embodiment, preferably, the first query unit includes:
the loading unit is used for loading each SQLite data file to the memory;
the aggregation calculation unit is used for performing aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query condition;
and the second determining unit is used for taking the information as a data query result of the data query request.
In the embodiment of the present application, preferably, the distributed storage system is an HBase database based on a Hadoop platform.
The following describes in detail a hardware structure of a server to which the data management method provided in the embodiment of the present application is applied, by taking the application of the data management method to the server as an example.
The data management method provided by the embodiment of the application can be applied to a server, and the server can be a service device which provides service for a user on a network side, can be a server cluster formed by a plurality of servers, and can also be a single server.
Optionally, fig. 8 shows a block diagram of a hardware structure of a server to which the data management method provided in the embodiment of the present application is applied, and referring to fig. 8, the hardware structure of the server may include: a processor 81, a memory 82, a communication interface 83 and a communication bus 84;
in the embodiment of the present invention, the number of the processor 81, the memory 82, the communication interface 83, and the communication bus 84 may be at least one, and the processor 81, the memory 82, and the communication interface 83 complete communication with each other through the communication bus 84;
the processor 81 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, or the like;
the memory 82 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program, the processor may invoke the program stored in the memory, and the program is operable to:
receiving a data query request, wherein the data query request carries data query conditions;
obtaining target SQLite data files related to data query conditions from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform;
and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result.
For the functions of the program, reference may be made to the above detailed description of a data management method provided in the embodiments of the present application, which is not described herein again.
Further, an embodiment of the present application also provides a computer-readable computer storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and the computer-executable instructions are used to execute the data management method.
For specific contents of the computer executable instructions, reference may be made to the above detailed description of a data management method provided in the embodiments of the present application, which is not repeated herein.
The data management method, the data management device, the data management server and the storage medium provided by the present invention are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include or include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for managing data, comprising:
receiving a data query request, wherein the data query request carries data query conditions;
obtaining target SQLite data files related to the data query condition from SQLite data files stored in the distributed data storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform;
and inquiring information matched with the data inquiry condition from each target SQLite data file as a data inquiry result.
2. The method of claim 1, wherein the obtaining target SQLite data files related to the data query condition from SQLite data files stored in the distributed storage system comprises:
acquiring a table name and an account period in the data query condition;
performing hash calculation on the table name and the account period to generate second information;
the method comprises the steps of inquiring an SQLite data file which carries first information and is the same as second information from SQLite data files stored in a distributed storage system, and determining the inquired SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, converting each data file into an SQLite data file, performing hash calculation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
3. The method of claim 1, wherein the obtaining target SQLite data files related to the data query condition from SQLite data files stored in the distributed storage system comprises:
acquiring a table name in the data query condition;
performing hash calculation on the table name to generate second information;
the method comprises the steps of inquiring an SQLite data file which carries first information and is the same as second information from SQLite data files stored in a distributed storage system, and determining the inquired SQLite data file as a target SQLite data file;
the computing platform is used for dividing the total amount of original data into data files, converting each data file into an SQLite data file, performing hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
4. The method of claim 1, wherein receiving a data query request comprises: and under the condition that the historical data query result of the data query request is not stored in the cache, receiving the data query request sent through the data access interface according to the data access interface specification.
5. The method of claim 4, further comprising:
and taking the historical data query result as a data query result under the condition that the historical data query result of the data query request is stored in the cache.
6. The method according to claim 1, wherein the querying information matching the data query condition from each target SQLite data file as a data query result comprises:
loading each target SQLite data file to a memory;
performing aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query condition;
and taking the information as a data query result of the data query request.
7. The method according to any one of claims 1 to 6, wherein the distributed storage system is an HBase database based on a Hadoop platform.
8. A data management apparatus, comprising:
the device comprises a receiving unit, a sending unit and a receiving unit, wherein the receiving unit is used for receiving a data query request which carries data query conditions;
the system comprises a first acquisition unit, a second acquisition unit and a data query unit, wherein the first acquisition unit is used for acquiring target SQLite data files related to the data query condition from SQLite data files stored in a distributed storage system, and the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into the SQLite data files by a computing platform;
and the first query unit is used for querying information matched with the data query conditions from each target SQLite data file as a data query result.
9. A server, characterized by at least one memory and at least one processor; the memory stores a program that the processor calls, the program stored by the memory for implementing the data management method according to any one of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored thereon for performing the data management method of any of claims 1-7.
CN201911410067.1A 2019-12-31 2019-12-31 Data management method, device, server and storage medium Active CN111159219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911410067.1A CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911410067.1A CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN111159219A true CN111159219A (en) 2020-05-15
CN111159219B CN111159219B (en) 2023-05-23

Family

ID=70559920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911410067.1A Active CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN111159219B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694708A (en) * 2020-05-28 2020-09-22 新浪网技术(中国)有限公司 Data query method and device, electronic equipment and storage medium
CN112000692A (en) * 2020-09-02 2020-11-27 平安养老保险股份有限公司 Page query feedback method and device, computer equipment and readable storage medium
CN112860695A (en) * 2021-02-08 2021-05-28 北京百度网讯科技有限公司 Monitoring data query method, device, equipment, storage medium and program product
CN114143279A (en) * 2020-08-13 2022-03-04 北京有限元科技有限公司 Sampling method and device for interactive recording and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080250057A1 (en) * 2005-09-27 2008-10-09 Rothstein Russell I Data Table Management System and Methods Useful Therefor
US20160267132A1 (en) * 2013-12-17 2016-09-15 Hewlett-Packard Enterprise Development LP Abstraction layer between a database query engine and a distributed file system
US20160306832A1 (en) * 2015-04-20 2016-10-20 Oracle International Corporation System and method for providing access to a sharded database using a cache and a shard technology
CN106649828A (en) * 2016-12-29 2017-05-10 中国银联股份有限公司 Data query method and system
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN106940778A (en) * 2017-03-10 2017-07-11 华东师范大学 A kind of encryption data method cracked based on the parallel dictionaries of GPU in support storehouse
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
CN109471890A (en) * 2018-10-16 2019-03-15 深圳壹账通智能科技有限公司 Generation method, terminal device and the medium of report file
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN109840254A (en) * 2018-12-14 2019-06-04 湖南亚信软件有限公司 A kind of data virtualization and querying method, device
US20190228100A1 (en) * 2018-01-24 2019-07-25 Walmart Apollo, Llc Systems and methods for high efficiency data querying
US20190253476A1 (en) * 2018-02-09 2019-08-15 InterPro Solutions, LLC Offline mobile data storage system and method
CN110309334A (en) * 2018-04-20 2019-10-08 腾讯科技(深圳)有限公司 Querying method, system, computer equipment and the readable storage medium storing program for executing of chart database

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080250057A1 (en) * 2005-09-27 2008-10-09 Rothstein Russell I Data Table Management System and Methods Useful Therefor
US20160267132A1 (en) * 2013-12-17 2016-09-15 Hewlett-Packard Enterprise Development LP Abstraction layer between a database query engine and a distributed file system
US20160306832A1 (en) * 2015-04-20 2016-10-20 Oracle International Corporation System and method for providing access to a sharded database using a cache and a shard technology
CN106649828A (en) * 2016-12-29 2017-05-10 中国银联股份有限公司 Data query method and system
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN106940778A (en) * 2017-03-10 2017-07-11 华东师范大学 A kind of encryption data method cracked based on the parallel dictionaries of GPU in support storehouse
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
US20190228100A1 (en) * 2018-01-24 2019-07-25 Walmart Apollo, Llc Systems and methods for high efficiency data querying
US20190253476A1 (en) * 2018-02-09 2019-08-15 InterPro Solutions, LLC Offline mobile data storage system and method
CN110309334A (en) * 2018-04-20 2019-10-08 腾讯科技(深圳)有限公司 Querying method, system, computer equipment and the readable storage medium storing program for executing of chart database
CN109471890A (en) * 2018-10-16 2019-03-15 深圳壹账通智能科技有限公司 Generation method, terminal device and the medium of report file
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN109840254A (en) * 2018-12-14 2019-06-04 湖南亚信软件有限公司 A kind of data virtualization and querying method, device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694708A (en) * 2020-05-28 2020-09-22 新浪网技术(中国)有限公司 Data query method and device, electronic equipment and storage medium
CN114143279A (en) * 2020-08-13 2022-03-04 北京有限元科技有限公司 Sampling method and device for interactive recording and storage medium
CN114143279B (en) * 2020-08-13 2023-10-24 北京有限元科技有限公司 Interactive recording sampling method and device and storage medium
CN112000692A (en) * 2020-09-02 2020-11-27 平安养老保险股份有限公司 Page query feedback method and device, computer equipment and readable storage medium
CN112000692B (en) * 2020-09-02 2023-06-23 平安养老保险股份有限公司 Page query feedback method and device, computer equipment and readable storage medium
CN112860695A (en) * 2021-02-08 2021-05-28 北京百度网讯科技有限公司 Monitoring data query method, device, equipment, storage medium and program product
CN112860695B (en) * 2021-02-08 2023-08-04 北京百度网讯科技有限公司 Monitoring data query method, device, equipment, storage medium and program product

Also Published As

Publication number Publication date
CN111159219B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN111159219B (en) Data management method, device, server and storage medium
CN106933854B (en) Short link processing method and device and server
US8166203B1 (en) Server selection based upon time and query dependent hashing
CN108196787B (en) Quota management method of cluster storage system and cluster storage system
EP3817333B1 (en) Method and system for processing requests in a consortium blockchain
CN110334094B (en) Data query method, system, device and equipment based on inverted index
CN109766318B (en) File reading method and device
CN111797091A (en) Method and device for querying data in database, electronic equipment and storage medium
CN110851474A (en) Data query method, database middleware, data query device and storage medium
CN113111038B (en) File storage method, device, server and storage medium
CN111400301A (en) Data query method, device and equipment
CN111159131A (en) Performance optimization method, device, equipment and computer readable storage medium
CN114138840A (en) Data query method, device, equipment and storage medium
US20230336368A1 (en) Block chain-based data processing method and related apparatus
JPWO2016092604A1 (en) Data processing system and data access method
CN116775712A (en) Method, device, electronic equipment, distributed system and storage medium for inquiring linked list
CN112579633A (en) Data retrieval method, device, equipment and storage medium
EP4155979A1 (en) Pre-calculation model scoring method and apparatus , device, and storage medium
JP2021508867A (en) Systems, methods and equipment for querying databases
US11416517B2 (en) Partitioning data in a clustered database environment
KR20110035665A (en) Ranking data system, ranking query system and ranking computation method for computing large scale ranking in real time
JP7392168B2 (en) URL refresh method, device, equipment and CDN node in CDN
CN111131497B (en) File transmission method and device, electronic equipment and storage medium
CN108287853B (en) Data relation analysis method and system
CN113094391B (en) Calculation method, device and equipment for data summarization supporting cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant