CN109299157B

CN109299157B - Data export method and device for distributed big single table

Info

Publication number: CN109299157B
Application number: CN201810981125.5A
Authority: CN
Inventors: 赖双波; 范渊
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2021-11-23
Anticipated expiration: 2038-08-27
Also published as: CN109299157A

Abstract

The invention discloses a data export method of a distributed big single table, which comprises the following steps: receiving a request for exporting data sent by a client; searching a distributed database through the sphinx, and acquiring a server identifier and a data partition ID which meet the request; acquiring a target data strip corresponding to the request according to the server identifier and the data partition ID; and writing the target data strip into the target document and exporting the target data strip to the client. According to the method, a server and a data partition which meet a request are inquired, and a target data strip is directly obtained according to a server identifier and a data partition ID, so that the data inquiry efficiency and the data export efficiency of a distributed large list are improved; meanwhile, the consumption of computer resources can be reduced, and the user experience is improved. Correspondingly, the data export device, the equipment and the readable storage medium of the distributed large single table also have the technical effects.

Description

Data export method and device for distributed big single table

Technical Field

The invention relates to the technical field of databases, in particular to a data export method, a device, equipment and a readable storage medium for a distributed large single table.

Background

With the advent of the big data age, high availability of databases has become increasingly important. Relational database technology has become a core technology in information systems. Because of the poor readability of the database, security and rights issues may be involved if the database is queried using the database tool, and therefore the data in the database may be exported for querying.

At present, a single table of a database is generally queried by using multiple threads, and then the table query is performed by using multiple threads again to acquire and export data in the database. However, this method is only suitable for the case where there is less data in the single table in the database, and when there is more data in the single table in the database, i.e., a large single table needs to be derived, the efficiency of using the database tool is slow, and more computer resources are consumed, which is not favorable for cost control and also reduces user experience; the large single table is a single table with millions or more than millions of data in a single table, and the distributed large single table is a large single table in the distributed database.

Therefore, how to improve the data export efficiency of the large list table is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a data export method, a device, equipment and a readable storage medium of a distributed large single table, so as to improve the data export efficiency of the large single table.

In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions:

a data export method of a distributed big single table comprises the following steps:

receiving a request for exporting data sent by a client;

searching a distributed database through the sphinx, and acquiring a server identifier and a data partition ID which meet the request;

acquiring a target data strip corresponding to the request according to the server identifier and the data partition ID;

and writing the target data strip into a target document and exporting the target document to the client.

The receiving a request for exporting data sent by a client comprises:

and receiving the request through searchd, and writing the query condition corresponding to the request into the sphinx configuration file.

The searching the distributed database through the sphinx and acquiring the server identification and the data partition ID which meet the request comprises the following steps:

searching the distributed database according to the sphinx configuration file, and acquiring all server identifications and all data partition IDs which meet the request according to a socket mode.

Wherein, the obtaining the target data strip corresponding to the request according to the server identifier and the data partition ID includes:

and acquiring target data strips corresponding to the request in batches according to the server identification, the data partition IDs and a preset target configuration file.

Before writing the target data strip into a target document and exporting the target document to the client, the method further includes:

sequencing the target data strips according to a time sequence, and judging whether the number of the target data strips exceeds a preset threshold value or not;

if yes, screening a preset number of data strips according to a time rule, and determining the screened data strips as the target data strips.

Wherein the writing the target data strip into a target document includes:

and when the format of the target document is excel, calling a poi function to write the target data strips into the target document one by one.

After writing the target data strip into a target document and exporting the target document to the client, the method further includes:

and visually displaying the target document at the client.

A data export device for a distributed big single table, comprising:

the receiving module is used for receiving a request for exporting data sent by a client;

the first acquisition module is used for searching the distributed database through the sphinx and acquiring the server identification and the data partition ID which meet the request;

a second obtaining module, configured to obtain, according to the server identifier and the data partition ID, a target data strip corresponding to the request;

and the export module is used for writing the target data strip into a target document and exporting the target data strip to the client.

A data export device for a distributed big sheet table, comprising:

a memory for storing a computer program;

a processor, configured to implement the steps of the data export method of the distributed big single table described in any one of the above items when executing the computer program.

A readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the data export method of the distributed big single table described in any one of the above.

According to the scheme, the data export method of the distributed large single table provided by the embodiment of the invention comprises the following steps: receiving a request for exporting data sent by a client; searching a distributed database through the sphinx, and acquiring a server identifier and a data partition ID which meet the request; acquiring a target data strip corresponding to the request according to the server identifier and the data partition ID; and writing the target data strip into a target document and exporting the target document to the client.

It can be seen that, when a request for exporting data sent by a client is received, firstly, a distributed database is searched through a sphinx, and a server identifier and a data partition ID which meet the request are obtained, so that a server for storing a target data strip and a corresponding data partition are determined; and then, acquiring a target data strip according to the server identification and the data partition ID, writing the target data strip into a target document, and exporting the target data strip to a client for a user to check. The method comprises the steps that firstly, a server and a data partition meeting a request are inquired, and because returned data are only a server identifier and a data partition ID and the data volume is small, the inquiry efficiency is improved; the target data strip is directly obtained according to the server identification and the data partition ID, and the storage position of the target data strip is determined, so that the data obtaining efficiency is improved; therefore, the data export efficiency of the large list table can be improved, and the load of the database is reduced; meanwhile, the distributed database is searched through the sphinx, so that the consumption of computer resources can be reduced, and the user experience is improved.

Accordingly, the data export device, the equipment and the readable storage medium of the distributed large single table provided by the embodiment of the invention also have the technical effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a data export method for a distributed big single table according to an embodiment of the present invention;

FIG. 2 is a flowchart of another data export method for a distributed big list table according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a data export apparatus of a distributed big single table according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a data export device of a distributed large single table according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a data export method, a device and equipment of a distributed big single table and a readable storage medium, which are used for improving the data export efficiency of the big single table.

Referring to fig. 1, a data export method for a distributed large list table according to an embodiment of the present invention includes:

s101, receiving a request for exporting data sent by a client;

s102, searching a distributed database through the sphinx, and acquiring a server identifier and a data partition ID which meet a request;

in particular, sphinx is a full-text search engine based on SQL, can perform full-text search in combination with MySQL and PostgreSQL, and can provide a search function more specialized than that of a distributed database, so that an application program can be specialized more easily. Therefore, when the server identification and the data partition ID are obtained by adopting the sphinx search, the data search efficiency can be improved. Wherein, the data partition ID is the accessid.

The amount of data in a distributed database is on the order of millions to tens of millions, and data is continually added to the distributed database over time. Generally, a single table in the distributed database stores data of the last three months by default, and a data partition can be established by taking a day as a unit; and correspondingly establishing a sphinx index for each data partition, wherein the size of the sphinx index is smaller and is about 1/10 of the original data partition. Meanwhile, each server in the distributed database has a server identifier, which is recorded as a POID.

Therefore, when a request for exporting data sent by a client is received, the distributed database can be searched through the sphinx, and the server identification and the data partition ID which meet the request are obtained, namely, the server and the corresponding data partition stored in the target data strip required to be exported by the request are determined.

S103, acquiring a target data strip corresponding to the request according to the server identifier and the data partition ID;

after the server identification and the data partition ID meeting the data export request are determined, the target data strip is directly obtained according to the server identification and the data partition ID, and therefore the data obtaining efficiency can be improved. It should be noted that, data in the distributed database is generally stored in a data strip format, and therefore, when data in the distributed database is acquired, the data is also acquired in the data strip format.

And S104, writing the target data strip into the target document and exporting the target document to the client.

And after the target data strip is obtained, writing the obtained target data strip into a target document and exporting the target document to a client so as to facilitate the user to check. The format of the target document may be excel, csv or txt, or may be a document in another format that is convenient for a user to view, edit, and filter. The document template may be preset, and after the target data bar is obtained, the target data bar is filled into the document template, so as to generate the target document.

It can be seen that, the embodiment provides a data export method of a distributed big single table, where when a request for exporting data sent by a client is received, a distributed database is searched through sphinx, and a server identifier and a data partition ID meeting the request are obtained, so as to determine a server storing a target data strip and a corresponding data partition; and then, acquiring a target data strip according to the server identification and the data partition ID, writing the target data strip into a target document, and exporting the target data strip to a client for a user to check.

The method comprises the steps that firstly, a server and a data partition meeting a request are inquired, and because returned data are only a server identifier and a data partition ID and the data volume is small, the inquiry efficiency is improved; the target data strip is directly obtained according to the server identification and the data partition ID, and the storage position of the target data strip is determined, so that the data obtaining efficiency is improved; therefore, the data export efficiency of the large list table can be improved, and the load of the database is reduced; meanwhile, the distributed database is searched through the sphinx, so that the consumption of computer resources can be reduced, and the user experience is improved.

The embodiment of the invention discloses another data export method of a distributed large list table, and compared with the previous embodiment, the embodiment further describes and optimizes the technical scheme.

Referring to fig. 2, another data export method for a distributed large list table according to an embodiment of the present invention includes:

s201, receiving a request for exporting data through searchd, and writing a query condition corresponding to the request into a sphinx configuration file;

s202, searching a distributed database according to the sphinx configuration file, and acquiring all server identifications and all data partition IDs meeting the request according to a socket mode;

in this embodiment, a request for exporting data sent by a client is received through the searchd, and a query condition carried by the request is written into the sphinx configuration file, so that the sphinx searches the distributed database according to the sphinx configuration file. The servers in the distributed database can be divided into application servers and distributed servers,

the application server is used for interacting with a user, and the user can manage the distributed server based on the application server; the distributed servers are used for storing data. The number of the application servers is generally one, and redundant application servers are also arranged for the time to time; the number of the distributed servers is generally multiple, and the application server and the distributed servers establish communication connection through the socket. The socket interface is an API based on a TCP/IP network, and may define various functions and routines.

The search daemon is a running search daemon, which is configured on each server in the distributed database in advance. The sphinx profile records search configuration information corresponding to a request for exporting data, for example: the maximum number of data pieces required to be returned, generally denoted by max; maximum timeout for query data, etc.

Therefore, when the searchd of the application server receives a request for exporting data sent by the client, the query condition corresponding to the request is written into the sphinx configuration file, the monitoring process is started, and the sphinx configuration file is forwarded to each distributed server through the socket; and searching according to the sphinx configuration file when the searchd on each distributed server receives the sphinx configuration file, returning the server identification and the data partition ID obtained by searching after the search is finished, wherein all the server identifications and all the data partition IDs meeting the request are obtained by searching at the moment, and in order to improve the efficiency, all the server identifications and all the data partition IDs can be obtained at one time.

S203, acquiring a target data strip corresponding to the request according to the server identifier and the data partition ID;

and S204, writing the target data strip into the target document and exporting the target document to the client.

In this embodiment, when the application server obtains the server identifier and the data partition ID, the server identifier and the data partition ID may be filled into a temporary table, so as to obtain the target data bar corresponding to the request according to the temporary table. The number of the server identifications and the data partition IDs in the temporary table can be preset, and when the number of the server identifications and the data partition IDs exceeds the setting of the temporary table, the server identifications and the data partition IDs can be acquired in batches. It should be noted that the embodiment may be implemented based on a WEB application.

As can be seen, the present embodiment provides another data export method for a distributed large single table, where the method receives a request through a search, writes a query condition corresponding to the request into a sphinx configuration file, searches a distributed database according to the sphinx configuration file, and obtains all server identifiers and all data partition IDs that satisfy the request according to a socket manner, so as to determine a server storing a target data strip and corresponding data partitions; and then, acquiring a target data strip according to the server identification and the data partition ID, writing the target data strip into a target document, and exporting the target data strip to a client for a user to check.

According to the method, a server and a data partition meeting a request are inquired according to a sphinx configuration file, a server identifier and a data partition ID are obtained according to a socket mode, and the inquiry efficiency is improved because the obtained data are only the server identifier and the data partition ID and the data volume is small; the target data strip is directly obtained according to the server identification and the data partition ID, and the storage position of the target data strip is determined, so that the data obtaining efficiency is improved; therefore, the data export efficiency of the large list table can be improved, and the load of the database is reduced; meanwhile, the distributed database is searched through the sphinx, so that the consumption of computer resources can be reduced, and the user experience is improved.

Based on any of the foregoing embodiments, it should be noted that the acquiring, according to the server identifier and the data partition ID, the target data strip corresponding to the request includes:

Specifically, after the server identifier and the data partition ID are obtained, the obtained server identifier and the obtained data partition ID may be screened according to a preset target configuration file. Because the maximum threshold limit of the number of data pieces is preset in the target configuration file, when the number of the data pieces corresponding to the acquired server identification and the data partition ID is greater than the maximum threshold of the number of the data pieces, acquiring the target data pieces in batches; and when the number of the data pieces corresponding to the acquired server identification and the data partition ID is not more than the maximum threshold value of the number of the data pieces, acquiring all target data pieces at one time.

For example: if the maximum threshold value of the number of data pieces is 1000, 1000 data pieces are returned each time, and when the number of actual data pieces is less than 1000, the data pieces are returned according to the actual number. "limit 0, 1000 may be added after the query statement; limit 1001, 2000 ".

Based on any of the above embodiments, before writing the target data strip into a target document and exporting the target document to the client, the method further includes:

Specifically, when the number of the acquired target data strips exceeds a preset threshold, a preset number of data strips can be screened out according to a time rule, and the screened data strips are determined as the target data strips. It should be noted that the query condition requested by the client is generally data generated in the latest time, so that data strips in a long time can be deleted.

Based on any of the foregoing embodiments, it should be noted that the writing the target data strip into the target document includes: and when the format of the target document is excel, calling a poi function to write the target data strips into the target document one by one.

Specifically, if m represents the number of data stripes written at a time, the poi function may be called as follows, for example: in this case, only m pieces of data are buffered in the memory, and the memory occupancy rate can be controlled. Before writing the target data strip into the target document, the storage IP address of the target data strip needs to be converted from unsigned integer to string. The storage IP address of the target data bar in the distributed database is unsigned integer, and the target data bar needs to be converted into a character string type after being exported. When the target document is in an excel format, the data bar can be written by calling a poi function.

Based on any of the foregoing embodiments, it should be noted that, after writing the target data strip into the target document and exporting the target document to the client, the method further includes: and visually displaying the target document at the client.

Specifically, after the target document is exported to the client, the target document may be displayed in a visualization window. And simultaneously, deleting the target document in the distributed database to reduce the memory occupancy rate.

The following introduces a data export apparatus for a distributed large single table according to an embodiment of the present invention, and the data export apparatus for a distributed large single table described below and the data export method for a distributed large single table described above may refer to each other.

Referring to fig. 3, an embodiment of the present invention provides a data export apparatus for a distributed large single table, including:

a receiving module 301, configured to receive a request for exporting data sent by a client;

a first obtaining module 302, configured to search the distributed database through a sphinx, and obtain a server identifier and a data partition ID that satisfy the request;

a second obtaining module 303, configured to obtain, according to the server identifier and the data partition ID, a target data strip corresponding to the request;

and the export module 304 is used for writing the target data strip into a target document and exporting the target data strip to the client.

Wherein the receiving module is specifically configured to:

The first obtaining module is specifically configured to:

The second obtaining module is specifically configured to:

Wherein, still include:

the judging module is used for sequencing the target data strips according to a time sequence and judging whether the number of the target data strips exceeds a preset threshold value or not;

and the screening module is used for screening a preset number of data strips according to a time rule when the number of the target data strips exceeds a preset threshold value, and determining the screened data strips as the target data strips.

Wherein the derivation module is specifically configured to:

Wherein, still include:

and the display module is used for visually displaying the target document at the client.

It can be seen that, this embodiment provides a data export apparatus of a distributed large single table, including: the device comprises a receiving module, a first obtaining module, a second obtaining module and a deriving module. Firstly, a receiving module receives a request for exporting data sent by a client; then, a first acquisition module searches a distributed database through the sphinx and acquires a server identifier and a data partition ID which meet the request; the second acquisition module acquires a target data strip corresponding to the request according to the server identifier and the data partition ID; and finally, the export module writes the target data strip into the target document and exports the target document to the client. Thus, all modules are in work and cooperation and each takes its own role, so that the data query efficiency and the data export efficiency of the distributed large form are improved; meanwhile, the consumption of computer resources can be reduced, and the user experience is improved.

The following introduces a data export device of a distributed large single table according to an embodiment of the present invention, and the data export device of the distributed large single table described below and the data export method and apparatus of the distributed large single table described above may refer to each other.

Referring to fig. 4, an embodiment of the present invention provides a data export apparatus for a distributed large single table, including:

a memory 401 for storing a computer program;

a processor 402, configured to implement the steps of the data export method of the distributed big single table according to any of the above embodiments when executing the computer program.

In the following, a readable storage medium provided by an embodiment of the present invention is introduced, and a readable storage medium described below and the above-described data export method, apparatus, and device of a distributed big single table may be referred to each other.

A readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the data export method of a distributed big single table as described in any of the above embodiments.

In any of the above embodiments, it should be noted that the threshold values referred to in this specification may be the same or different. That is, the threshold value can be flexibly adjusted based on the actual application, so this is not specifically limited in this specification.

Based on any of the above embodiments, in order to more clearly illustrate the data derivation method of the distributed large list provided by the present invention, the following implementation steps are listed:

step 1, deploying complete web application, a sphinx index service and a database environment;

step 2, writing the query condition into a sphinx. conf configuration file according to the query condition of the web page;

step 3, the searchd service of the server side issues the query condition information to the searchd service of other distributed servers and starts monitoring;

step 4, the server searchd sorts the monitored data and writes the sorted data into a temporary database table; wherein the temporary table is used for storing a server identification (POID) and a data partition ID (accessed);

and 5, acquiring data according to the access and the POID of the temporary table: acquiring data to be exported, taking m (threshold) pieces of data each time, and only storing the acquired data in a memory;

step 6, processing the data obtained in the step 5, for example, the IP address of the original data stored in the database is unsigned integer, and the original data needs to be converted into a character string type;

step 7, writing the data processed in the step 6 into excel one by one, calling a poi function of excel, and caching m pieces of data in a memory by a statement of 'Workbook wb ═ new SXSSSFWkbook (m)';

step 8, after the processing in step 7 is completed, judging whether the database data is completely acquired, namely judging whether the data meeting the current query conditions are completely acquired, and automatically skipping to step 5 if the data are not completely acquired; if the data are completely acquired, generating an excel file, and entering the next step 9;

and 9, downloading the generated excel file by the web application client, and deleting the excel file of the server after downloading.

And finishing the implementation of the distributed big spreadsheet data export excel method based on the web application. It should be noted that, in this embodiment, there is no specific limitation on the type and version of the database, and the platform of the web application is also not specifically limited.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data export method for a distributed big single table is characterized by comprising the following steps:

receiving a request for exporting data sent by a client;

searching a distributed database through the sphinx, and acquiring a server identifier and a data partition ID which meet the request; the data partition method comprises the steps that a single table in a distributed database stores data of the last three months by default, and a data partition is established by taking days as a unit; correspondingly establishing a sphinx index for each data partition;

writing the target data strip into a target document and exporting the target document to the client;

acquiring target data strips corresponding to the request in batches according to the server identification, the data partition IDs and a preset target configuration file; the maximum threshold limit of the number of data pieces is preset in the target configuration file, and the obtained server identification and the data partition ID can be screened according to the target configuration file;

the receiving a request for exporting data sent by a client comprises: receiving the request through searchd, and writing the query condition corresponding to the request into a sphinx configuration file; the searching the distributed database through the sphinx and acquiring the server identification and the data partition ID which meet the request comprises the following steps: searching the distributed database according to the sphinx configuration file, and acquiring all server identifications and all data partition IDs which meet the request according to a socket mode;

the server in the distributed database is divided into an application server and a distributed server, and the application server and the distributed server establish communication connection through a socket; when the searchd of the application server receives a request for exporting data sent by a client, writing query conditions corresponding to the request into a sphinx configuration file, starting a monitoring process, and further forwarding the sphinx configuration file to each distributed server through a socket; and when the searchd on each distributed server receives the sphinx configuration file, searching according to the sphinx configuration file, and after the search is finished, returning the server identification and the data partition ID obtained by the search.

2. The method for exporting data from a distributed big single table according to claim 1, wherein before writing said target data strip into a target document and exporting said target document to said client, said method further comprises:

3. The data export method of the distributed big single table as claimed in claim 1, wherein said writing said target data strip into a target document comprises:

4. The data export method of the distributed big single table according to any one of claims 1 to 3, wherein after the target data strip is written into a target document and exported to the client, the method further comprises:

and visually displaying the target document at the client.

5. A data export apparatus for a distributed big single table, comprising:

the first acquisition module is used for searching the distributed database through the sphinx and acquiring the server identification and the data partition ID which meet the request; the data partition method comprises the steps that a single table in a distributed database stores data of the last three months by default, and a data partition is established by taking days as a unit; correspondingly establishing a sphinx index for each data partition;

the export module is used for writing the target data strip into a target document and exporting the target data strip to the client;

the second obtaining module is specifically configured to:

wherein the receiving module is specifically configured to:

receiving the request through searchd, and writing the query condition corresponding to the request into a sphinx configuration file;

the first obtaining module is specifically configured to:

searching the distributed database according to the sphinx configuration file, and acquiring all server identifications and all data partition IDs which meet the request according to a socket mode;

6. A data export device for a distributed big sheet, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the data derivation method of a distributed big single table according to any of claims 1 to 4 when executing said computer program.

7. A readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for data export of a distributed big single table according to any of claims 1 to 4.