CN112685427B - Data access method, device, electronic equipment and storage medium - Google Patents

Data access method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112685427B
CN112685427B CN202110096250.XA CN202110096250A CN112685427B CN 112685427 B CN112685427 B CN 112685427B CN 202110096250 A CN202110096250 A CN 202110096250A CN 112685427 B CN112685427 B CN 112685427B
Authority
CN
China
Prior art keywords
data
import
imported
configuration
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110096250.XA
Other languages
Chinese (zh)
Other versions
CN112685427A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lakala Payment Co ltd
Original Assignee
Lakala Payment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lakala Payment Co ltd filed Critical Lakala Payment Co ltd
Priority to CN202110096250.XA priority Critical patent/CN112685427B/en
Publication of CN112685427A publication Critical patent/CN112685427A/en
Application granted granted Critical
Publication of CN112685427B publication Critical patent/CN112685427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure discloses a data access method, a device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: receiving a data acquisition request; the data acquisition request comprises configuration data of data to be imported; determining data import parameters according to the configuration data; determining the number of parallel threads of the threads for data import and the sub-data quantity of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; and executing the threads in parallel, and acquiring data to be imported from a production platform by the threads. The technical scheme can accelerate the process of the data extraction process with larger data access quantity, so as to reduce the probability of the database snapshot overage by improving the data extraction speed, thereby greatly improving the access success rate of the data table with large data quantity and reducing the influence of the access failure on downstream business.

Description

Data access method, device, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a data access method, a data access device, electronic equipment and a storage medium.
Background
The big data cluster system generally can intensively store the data of the production system, and is convenient to manage and use in the downstream. However, there is a problem in that the data table storing data in the production system is excessively large in data amount, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in the system, namely, a part of the data table in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, and some data tables can be limited by the actual situation of the original design or service data, only full-energy access can be realized during data extraction, the incremental access cannot be realized, and failure can easily occur when the data extraction amount exceeds tens of millions. The inventors of the present disclosure found that one reason for access failure is that database snapshots are too old, and in the case where these access data tables are relatively important service tables, serious service impact is easily caused to downstream services (report, lubrication, accounting, etc.) if access fails. Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.
Disclosure of Invention
The embodiment of the disclosure provides a data access method, a data access device, electronic equipment, a storage medium and a program product.
In a first aspect, an embodiment of the present disclosure provides a data access method, including:
receiving a data acquisition request; the data acquisition request comprises configuration data of data to be imported;
determining data import parameters according to the configuration data;
determining the number of parallel threads of the threads for data import and the sub-data quantity of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
and executing the threads in parallel, and acquiring data to be imported from a production platform by the threads.
Further, determining the data import parameter according to the configuration data includes:
determining whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;
and when the data import mode is an incremental data import mode, determining an import starting position of the data source of the data to be imported in the generation platform.
Further, determining the data import parameter according to the configuration data includes:
parameters in the configuration data are determined for specifying the number of parallel threads.
Further, determining the data import parameter according to the configuration data further includes:
and determining a data segmentation field used for segmenting the data to be imported in the configuration data.
Further, determining the number of parallel threads for data import and the sub-data amount of data to be imported for each thread according to the data import parameters, including:
and when the parameter for specifying the number of parallel threads is empty, determining the number of parallel threads as a default value.
Further, determining the number of parallel threads for data import and the sub-data amount of data to be imported for each thread according to the data import parameters, including:
determining the total data amount of the data to be imported according to the maximum value and the minimum value of the data segmentation field in the data to be imported;
and determining the sub-data quantity of the data to be imported by each thread according to the data quantity and the parallel thread quantity.
Further, the method further comprises:
receiving a data importing mode determining request sent by a client; the data importing mode determining request comprises information of the data source;
Determining whether the data source supports an incremental data import mode according to the data structure design of the data source or the type of the stored service data;
and sending reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.
In a second aspect, an embodiment of the present disclosure provides a data access method, including:
responding to the detected data acquisition configuration operation, and displaying a data source configuration interface to a user;
acquiring a data source of data to be imported, which is provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, displaying a full-quantity data import configuration interface to the user, and acquiring configuration data from the full-quantity data import configuration interface;
and sending a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data.
Further, the method further comprises:
sending a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;
and receiving reply information of the data import server, wherein the reply information comprises an indication of whether the data source supports full data import.
Further, the method further comprises:
when the data source supports incremental data import, an incremental data import configuration interface is displayed for the user, and configuration data is obtained from the incremental data import configuration interface.
Further, when the data source supports incremental data import, presenting an incremental data import configuration interface to the user, comprising:
determining the last time and/or the last position of the data imported into the data source last time;
and displaying the last time and/or the last position in the incremental data importing configuration interface.
In a third aspect, an embodiment of the present disclosure provides a data access method, including:
the client side responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;
the client acquires a data source of data to be imported, which is provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, the client displays a full data import configuration interface to the user and acquires configuration data from the full data import configuration interface;
the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data;
The data importing server receives a data obtaining request; the data acquisition request comprises configuration data of data to be imported;
the data import server determines data import parameters according to the configuration data;
the data import server determines the number of parallel threads of the threads for importing data and the sub-data quantity of the data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
and the data import server executes the threads in parallel, and the threads acquire data to be imported from the production platform.
Further, the method further comprises:
the client sends a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;
the data import server determines whether the data source supports a full data import mode according to the data structure design of the data source or the type of the stored service data;
and the data import server sends reply information to the production platform, wherein the reply information comprises an indication of whether the data source supports a full data import mode.
Further, the data import server determines the data import parameters according to the configuration data, including:
the data import server determines whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;
when the data import mode is an incremental data import mode, the data import server determines an import starting position of the data source of the data to be imported in the generation platform.
Further, the data import server determines the data import parameters according to the configuration data, including:
the data import server determines parameters in the configuration data that specify the number of parallel threads.
Further, the data import server determines the data import parameters according to the configuration data, and further includes:
the data importing server determines a data segmentation field used for segmenting the data to be imported in the configuration data.
Further, the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameter, including:
And the data import server determines the number of parallel threads as a default value when the parameter for designating the number of parallel threads is empty.
Further, the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameter, including:
the data importing server determines the total data amount of the data to be imported according to the maximum value and the minimum value of the data dividing field in the data to be imported;
and the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity.
Further, the method further comprises:
when the data source supports incremental data import, the client side displays an incremental data import configuration interface to the user and acquires configuration data from the incremental data import configuration interface.
Further, when the data source supports incremental data import, the client presents an incremental data import configuration interface to the user, including
The client determines the last time and/or the last position of the data imported into the data source last time;
And the client displays the last time and/or the last position in the incremental data importing configuration interface.
In a fourth aspect, in an embodiment of the present disclosure, there is provided a data access apparatus, including:
a receiving module configured to receive a data acquisition request; the data acquisition request comprises configuration data of data to be imported;
a first determining module configured to determine a data import parameter according to the configuration data;
the second determining module is configured to determine the number of parallel threads of the threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
and the parallel execution module is configured to execute the number of the threads in parallel, and the threads acquire data to be imported from the production platform.
In a fifth aspect, in an embodiment of the present disclosure, there is provided a data access apparatus, including:
the response module is configured to respond to the detected data acquisition configuration operation and display a data source configuration interface to a user;
the acquisition module is configured to acquire a data source of data to be imported, which is provided by a user in the data source configuration interface;
The display module is configured to display a full data import configuration interface to the user when the data source does not support the incremental data import mode, and acquire configuration data from the full data import configuration interface;
and the sending module is configured to send a data acquisition request to the data import server, wherein the data acquisition request comprises the configuration data.
The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible design, the structure of the above apparatus includes a memory for storing one or more computer instructions for supporting the above apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may further comprise a communication interface for the apparatus to communicate with other devices or a communication network.
In a sixth aspect, in an embodiment of the present disclosure, there is provided a data access system, including: a client and a data importing server;
the client side responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;
The client acquires a data source of data to be imported, which is provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, the client displays a full data import configuration interface to the user and acquires configuration data from the full data import configuration interface;
the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data;
the data importing server receives a data obtaining request; the data acquisition request comprises configuration data of data to be imported;
the data import server determines data import parameters according to the configuration data;
the data import server determines the number of parallel threads of the threads for importing data and the sub-data quantity of the data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
and the data import server executes the threads in parallel, and the threads acquire data to be imported from the production platform.
In a seventh aspect, embodiments of the present disclosure provide an electronic device comprising a memory for storing one or more computer instructions supporting any of the apparatus for performing the corresponding method described above, and a processor configured to execute the computer instructions stored in the memory. Any of the above-described apparatuses may further include a communication interface for communicating with other devices or a communication network.
In an eighth aspect, embodiments of the present disclosure provide a computer-readable storage medium storing computer instructions for use by any one of the above-described apparatuses, including computer instructions for performing any one of the above-described methods.
In a ninth aspect, embodiments of the present disclosure provide a computer program product comprising computer instructions for implementing the steps of the method of any one of the above aspects when executed by a processor.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
in the technical scheme provided by the embodiment of the disclosure, in the process of extracting different service data from a production system to a large data cluster system for centralized storage, configuration is performed by an operation and maintenance personnel through a client, a server splits a data extraction process with larger data access amount into a plurality of, for example, n data extraction processes which are executed concurrently according to configuration data, one part of the data extraction processes is extracted in each data extraction process, the sum of the data amounts extracted by the n data extraction processes executed concurrently is the total amount of data to be extracted, and the time spent for data extraction is only one n times of the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.
Drawings
Other features, objects and advantages of the embodiments of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:
fig. 1 shows a flow chart of a data access method according to an embodiment of the present disclosure;
fig. 2 shows a flow chart of a data access method according to another embodiment of the present disclosure;
fig. 3 shows a flow chart of a data access method according to another embodiment of the present disclosure;
fig. 4 illustrates an application scenario diagram of a data access method according to an embodiment of the present disclosure;
fig. 5 shows an overall flowchart of a data access method according to an embodiment of the present disclosure;
fig. 6 shows a block diagram of a data access device according to an embodiment of the present disclosure;
fig. 7 shows a block diagram of a data access system according to another embodiment of the present disclosure;
fig. 8 shows a block diagram of a data access system according to another embodiment of the present disclosure;
fig. 9 is a schematic diagram of a computer system suitable for use in implementing a data access method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary implementations of the embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.
In the presently disclosed embodiments, it is to be understood that the terms such as "comprises" or "comprising" and the like are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in the present specification, and are not intended to exclude the possibility of one or more other features, numbers, steps, acts, components, portions, or combinations thereof being present or added.
In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. Embodiments of the present disclosure will be described in detail below with reference to the attached drawings in conjunction with the embodiments.
In the technical scheme provided by the embodiment of the disclosure, in the process of extracting different service data from a production system to a large data cluster system for centralized storage, configuration is performed by an operation and maintenance personnel through a client, a server splits a data extraction process with larger data access amount into a plurality of, for example, n data extraction processes which are executed concurrently according to configuration data, one part of the data extraction processes is extracted in each data extraction process, the sum of the data amounts extracted by the n data extraction processes executed concurrently is the total amount of data to be extracted, and the time spent for data extraction is only one n times of the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.
Fig. 1 shows a flowchart of a data access method according to an embodiment of the present disclosure, as shown in fig. 1, including the following steps S101-S104:
in step S101, a data acquisition request is received; the data acquisition request comprises configuration data of data to be imported;
in step S102, determining a data import parameter according to the configuration data;
in step S103, determining the number of parallel threads of the threads for data import and the sub-data amount of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
in step S104, the number of threads in parallel is executed, and the threads acquire data to be imported from the production platform.
The big data cluster system needs to store the data of the production system in a centralized way, so that the management is convenient and the data is used for downstream. In the production system, there are cases where the data amount of many data tables storing data is too large, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in a system, namely, a part of the data tables in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, some data tables can be limited by the actual situation of design or service data at the beginning, only full access is possible during data extraction, access with the data amount exceeding tens of millions of times is easy to fail, for example, the most common access failure cause can be that the database snapshot is too old, and the data tables can be important service tables, and if the access fails, serious service influence is easily caused on downstream services (report, moistening, accounting and the like). Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.
In view of the above, in this embodiment, a data access method is proposed, in which, in a process of extracting different service data from a production system into a large data cluster system for centralized storage, configuration is performed by an operation and maintenance person through a client, a server splits a data extraction process with a large data access amount into a plurality of, for example, n concurrently executed data extraction processes according to configuration data, one of the data extraction processes is extracted, and the sum of the data amounts extracted by the n concurrently executed data extraction processes is the total amount of data to be extracted, while the time taken for data extraction is only one-n times the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.
In one embodiment of the present disclosure, the data access method may be adapted to operate in a data import server that extracts data from a production system in a large data cluster system.
In an embodiment of the disclosure, when the operator needs to import the data generated by the generating platform, the operator may configure the data through the client and provide the configuration data to the data importing server, so as to request the data importing server to import corresponding data from the generating platform according to the configuration data. Configuration data may include, but is not limited to, a data source (e.g., database identification, data table identification, field identification, etc. in the production platform) of the current import data, data import parameters, etc. The data import parameters can determine the data import mode, the number of parallel threads required for parallel import, and the like. It will be appreciated that the configuration data may not include the data import parameters, but only specify the data source, and the data import server determines the data import parameters according to the actual situation of the data source.
After receiving a data acquisition request of a client, the data import server extracts configuration data from the data acquisition request, determines data import parameters according to the configuration data, further determines the number of parallel threads for data import at this time based on the data import parameters, and determines word count measurement of data to be imported for each parallel thread according to the number of parallel threads and a total data amount set of a data source.
In an embodiment of the present disclosure, after determining the number of parallel threads and the sub-data amount of the data to be imported by each parallel thread, the number of parallel threads may be started, where the number of parallel threads may run in parallel, and the data to be imported may be obtained from the data source of the production platform, respectively, and after being finally merged by the data importing server, the data imported by each thread is stored in the big data cluster database.
In an embodiment of the present disclosure, step S102, that is, the step of determining the data import parameter according to the configuration data, further includes the steps of:
determining whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;
and when the data import mode is an incremental data import mode, determining an import starting position of the data source of the data to be imported in the generation platform.
In this alternative implementation, the data in the production platform may be imported by way of full importation or incremental importation. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
Considering that some data sources such as data tables are limited by the design at the beginning or the actual condition of service data can only be accessed in full quantity, the operation and maintenance personnel can specify the data import mode according to the actual condition of the data sources, and of course, it is understood that the operation and maintenance personnel can also not specify the data import mode, but the data import server determines the data sources and the data import modes which can be supported by the data sources according to the data to be imported by the operation and maintenance personnel.
When the operation and maintenance personnel appoints to import the data in an incremental mode, whether a data source corresponding to the data import supports the incremental import mode or not can be determined, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.
In an embodiment of the present disclosure, step S102, that is, the step of determining the data import parameter according to the configuration data, further includes the steps of:
parameters in the configuration data are determined for specifying the number of parallel threads.
In this optional implementation manner, the number of parallel threads imported by the present data may be specified by the operation and maintenance personnel, that is, the configuration data is configured, if the operation and maintenance personnel configures the number of parallel threads in the configuration data, the data configured by the operation and maintenance personnel is used as a reference, and if the operation and maintenance personnel does not specify the number of parallel threads in the configuration data, the number of parallel threads may be determined as a default value based on default data.
In an embodiment of the present disclosure, step S102, that is, the step of determining the data import parameter according to the configuration data, further includes the steps of:
and determining a data segmentation field used for segmenting the data to be imported in the configuration data.
In this alternative implementation, since data in the same data source is imported by multiple parallel threads, each parallel thread extracts a portion of the data from the data source and imports it into the large data cluster data warehouse, and in order to distinguish the portion of the data to be extracted from the data source by each parallel thread, a data splitting field for splitting the data to be imported may be configured in the configuration data by an operator. For example, the operator designates the data division field as "ID" in the data table in the configuration data, and performs the importing by the total importing manner, and the data table includes data with the value of the ID field being 1-100, that is, the data table includes 100 pieces of data, and in the case that the number of parallel threads is 10, each parallel thread needs to import 100/10=10 pieces of data, it may be determined that the 1 st parallel thread extracts data with id=1, 2, … … 10 from the data source, and the 2 nd parallel thread extracts data with id=11, 12, … … 20 from the data source, and so on, and the 10 th parallel thread extracts data with id=91, 92, … … 100 from the data source.
In an embodiment of the present disclosure, step S103, that is, determining, according to the data import parameter, the number of parallel threads for importing data and the sub-data amount of data that each thread needs to import, further includes the following steps:
and when the parameter for specifying the number of parallel threads is empty, determining the number of parallel threads as a default value.
In this alternative implementation, the operator may specify the number of parallel threads that are started by the current data import in the configuration data. If the operation and maintenance personnel configures the parallel thread quantity in the configuration data, the data configured by the operation and maintenance personnel is used as the reference, and if the operation and maintenance personnel does not specify the parallel thread quantity in the configuration data, the parallel thread quantity can be determined as a default value by using default data as the reference. The default value may be set in advance in the data import server.
In an embodiment of the present disclosure, step S103, that is, determining, according to the data import parameter, the number of parallel threads for importing data and the sub-data amount of data that each thread needs to import, further includes the following steps:
determining the total data amount of the data to be imported according to the maximum value and the minimum value of the data segmentation field in the data to be imported;
And determining the sub-data quantity of the data to be imported by each thread according to the data quantity and the parallel thread quantity.
In this optional implementation manner, after the operator formulates the data division field, the data importing server may determine the maximum value and the minimum value of the data division field in the data to be imported first, if the data division field is a full import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data in the corresponding data source, and if the data division field is an incremental import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data after the last data and the last data in the corresponding data source are imported. Typically, the data splitting field is identified with an increasing ID, so that the total data size of the data to be imported can be determined by the maximum value and the minimum value.
And under the condition that the number of parallel threads and the total data quantity are determined, the sub-data quantity which needs to be imported by each parallel thread can be determined by utilizing the mode that the number of parallel threads averages the total data quantity.
In an embodiment of the disclosure, the method further comprises the steps of:
receiving a data importing mode determining request sent by a client; the data importing mode determining request comprises information of the data source;
Determining whether the data source supports an incremental data import mode according to the data structure design of the data source or the type of the stored service data;
and sending reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.
In this optional implementation manner, when the operator performs data configuration, when it is not determined whether the data source supports the incremental import manner, a data import manner determination request may be sent to the data import server, and the data import server may determine, according to the data source information in the request, a data structure design of the data source or a type of the stored service data, and then determine whether the data structure design or the type of the stored service data supports the incremental import manner. The data import server will request to send a reply message to the client based on the data import mode, where the reply message may include an indication of whether the data source supports the incremental data import mode. The operation and maintenance personnel can configure whether the data import adopts an incremental data import mode or a full data import mode according to the instruction.
Fig. 2 shows a flowchart of a data access method according to another embodiment of the present disclosure, which includes the following steps S201 to S204, as shown in fig. 2:
In step S201, a data source configuration interface is presented to a user in response to the detected data acquisition configuration operation;
in step S202, a data source of data to be imported, which is provided by a user in the data source configuration interface, is obtained;
in step S203, when the data source does not support the incremental data import manner, displaying a full-scale data import configuration interface to the user, and acquiring configuration data from the full-scale data import configuration interface;
in step S204, a data acquisition request is sent to the data import server, the data acquisition request including the configuration data.
The big data cluster system needs to store the data of the production system in a centralized way, so that the management is convenient and the data is used for downstream. In the production system, there are cases where the data amount of many data tables storing data is too large, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in a system, namely, a part of the data tables in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, some data tables can be limited by the actual situation of design or service data at the beginning, only full access is possible during data extraction, access with the data amount exceeding tens of millions of times is easy to fail, for example, the most common access failure cause can be that the database snapshot is too old, and the data tables can be important service tables, and if the access fails, serious service influence is easily caused on downstream services (report, moistening, accounting and the like). Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.
In view of the above, in this embodiment, a data access method is proposed, in which, in a process of extracting different service data from a production system into a large data cluster system for centralized storage, configuration is performed by an operation and maintenance person through a client, a server splits a data extraction process with a large data access amount into a plurality of, for example, n concurrently executed data extraction processes according to configuration data, one of the data extraction processes is extracted, and the sum of the data amounts extracted by the n concurrently executed data extraction processes is the total amount of data to be extracted, while the time taken for data extraction is only one-n times the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.
In an embodiment of the present disclosure, the data access method may be adapted to run on a client configured by an operation and maintenance person during data extraction from a production system in a large data cluster system.
The operation and maintenance personnel can perform configuration operation through a configuration operation interface provided on the client, after the client detects the data acquisition configuration operation of the operation and maintenance personnel, the client displays a data source configuration interface for a user, the user can provide data sources of data to be imported on the data source configuration interface, the client can prestore data importing modes supported by the data sources, and when the data sources of the data to be imported provided by the user do not support incremental data importing modes, the client can display a full data importing configuration interface for the user to configure configuration data in the full data importing modes on the interface. After the client obtains the configuration data configured by the user, the configuration data is sent to the data importing server to request the data importing server to import data from the production platform according to the configuration data, and the imported data can be stored in a data warehouse in the large data cluster system.
The data in the production platform can be imported in a full import mode or an incremental import mode. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
The incremental data importing method can continue importing data after the data imported in the previous time, and the data imported before each time is not needed to be imported repeatedly, so that the importing data amount is small, the time consumption is short, and the data importing error is not easy to generate. Therefore, when the incremental data import is supported by the data source, the incremental data import is preferentially used, and when the incremental data import is not supported by the data source, the full-size data import is used.
Configuration data may include, but is not limited to, a data source (e.g., database identification, data table identification, field identification, etc. in the production platform) of the current import data, data import parameters, etc. The data import parameters can determine the data import mode, the number of parallel threads required for parallel import, and the like. It will be appreciated that the configuration data may not include the data import parameters, but only specify the data source, and the data import server determines the data import parameters according to the actual situation of the data source.
Therefore, the client may store the data import method supported by the data source in advance, or may request the data import method of the data source information from the data import server after receiving the data source information of the data import configuration performed by the operation and maintenance personnel.
After receiving a data acquisition request of a client, the data import server extracts configuration data from the data acquisition request, determines data import parameters according to the configuration data, further determines the number of parallel threads for data import at this time based on the data import parameters, and determines word count measurement of data to be imported for each parallel thread according to the number of parallel threads and a total data amount set of a data source.
In an embodiment of the present disclosure, after determining the number of parallel threads and the sub-data amount of the data to be imported by each parallel thread, the data importing server may start the number of parallel threads, where the number of parallel threads may run in parallel, obtain the data to be imported from the data source of the production platform, and store the data imported by each thread in the large data cluster database after being finally merged by the data importing server.
In an implementation manner of an embodiment of the present disclosure, the method further includes the following steps:
sending a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;
and receiving reply information of the data import server, wherein the reply information comprises an indication of whether the data source supports full data import.
In this optional implementation manner, when the operator performs data configuration, when it is not determined whether the data source supports the incremental import manner, a data import manner determination request may be sent to the data import server, and the data import server may determine, according to the data source information in the request, a data structure design of the data source or a type of the stored service data, and then determine whether the data structure design or the type of the stored service data supports the incremental import manner. The data import server will request to send a reply message to the client based on the data import mode, where the reply message may include an indication of whether the data source supports the incremental data import mode. The operation and maintenance personnel can configure whether the data import adopts an incremental data import mode or a full data import mode according to the instruction.
In an implementation manner of an embodiment of the present disclosure, the method further includes the following steps:
when the data source supports incremental data import, an incremental data import configuration interface is displayed for the user, and configuration data is obtained from the incremental data import configuration interface.
In this alternative implementation, the data in the production platform may be imported by way of full importation or incremental importation. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
The incremental data importing method can continue importing data after the data imported in the previous time, and the data imported before each time is not needed to be imported repeatedly, so that the importing data amount is small, the time consumption is short, and the data importing error is not easy to generate. Therefore, when the incremental data import is supported by the data source, the incremental data import is preferentially used, and when the incremental data import is not supported by the data source, the full-size data import is used.
Considering that some data sources such as data tables are limited by the design at the beginning or the actual condition of service data can only be accessed in full quantity, the operation and maintenance personnel can specify the data import mode according to the actual condition of the data sources, and of course, it is understood that the operation and maintenance personnel can also not specify the data import mode, but the data import server determines the data sources and the data import modes which can be supported by the data sources according to the data to be imported by the operation and maintenance personnel.
When the operation and maintenance personnel appoints to import the data in an incremental mode, whether a data source corresponding to the data import supports the incremental import mode or not can be determined, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.
In an implementation manner of an embodiment of the present disclosure, when the data source supports incremental data import, the step of displaying an incremental data import configuration interface to the user further includes the following steps:
determining the last time and/or the last position of the data imported into the data source last time;
and displaying the last time and/or the last position in the incremental data importing configuration interface.
In this optional implementation manner, when the data source supports the incremental data import manner, the current data import may perform data import on data subsequent to the previous data import, so that the client may locally store information of a last piece of data of the previous data import, or obtain information of a piece of data last imported when the data source is data-imported from the data import server, and then determine a last time and/or a last position when the data source is data-imported from the previous data according to the information of the last piece of data, and then perform incremental data import according to the last time and the last position. Therefore, after the information is determined, the client can display the last time and/or the last position corresponding to the data source when the data is imported last time on the page, and further can enable operation and maintenance personnel to configure the starting position of the data import. The starting position may be determined by the last time and/or the last position described above. For example, if the last time of the last data import is XX, the operation and maintenance personnel can configure to import the data newly added from XX; for another example, if the last position at which the last data was imported is YY, the operator may configure to import data from the yy+1 position.
The technical terms and features of the embodiment shown in fig. 2 and related thereto are the same as or similar to those mentioned in the embodiment shown in fig. 1 and related thereto, and the explanation and description of the technical terms and features of the embodiment shown in fig. 2 and related thereto will be referred to the explanation of the embodiment shown in fig. 1 and related thereto, and will not be repeated here.
Fig. 3 shows a flowchart of a data access method according to another embodiment of the present disclosure, which includes the following steps S301 to S308, as shown in fig. 3:
in step S301, the client side presents a data source configuration interface to the user in response to the detected data acquisition configuration operation;
in step S302, the client acquires a data source of data to be imported provided by a user in the data source configuration interface;
in step S303, when the data source does not support the incremental data import manner, the client displays a full-volume data import configuration interface to the user, and obtains configuration data from the full-volume data import configuration interface;
in step S304, the client sends a data acquisition request to a data import server, where the data acquisition request includes the configuration data;
In step S305, the data import server receives a data acquisition request; the data acquisition request comprises configuration data of data to be imported;
in step S306, the data import server determines data import parameters according to the configuration data;
in step S307, the data import server determines, according to the data import parameter, the number of parallel threads of the threads that perform data import and the sub-data amount of the data that each thread needs to import; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
in step S308, the data import server executes the number of threads in parallel, and the threads obtain data to be imported from the production platform.
The big data cluster system needs to store the data of the production system in a centralized way, so that the management is convenient and the data is used for downstream. In the production system, there are cases where the data amount of many data tables storing data is too large, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in a system, namely, a part of the data tables in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, some data tables can be limited by the actual situation of design or service data at the beginning, only full access is possible during data extraction, access with the data amount exceeding tens of millions of times is easy to fail, for example, the most common access failure cause can be that the database snapshot is too old, and the data tables can be important service tables, and if the access fails, serious service influence is easily caused on downstream services (report, moistening, accounting and the like). Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.
In view of the above, in this embodiment, a data access method is proposed, in which, in a process of extracting different service data from a production system into a large data cluster system for centralized storage, configuration is performed by an operation and maintenance person through a client, a server splits a data extraction process with a large data access amount into a plurality of, for example, n concurrently executed data extraction processes according to configuration data, one of the data extraction processes is extracted, and the sum of the data amounts extracted by the n concurrently executed data extraction processes is the total amount of data to be extracted, while the time taken for data extraction is only one-n times the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.
In an embodiment of the present disclosure, the data access method may be adapted to operate in a large data cluster system for parallel importing data from a production platform.
The operation and maintenance personnel can perform configuration operation through a configuration operation interface provided on the client, after the client detects the data acquisition configuration operation of the operation and maintenance personnel, the client displays a data source configuration interface for a user, the user can provide data sources of data to be imported on the data source configuration interface, the client can prestore data importing modes supported by the data sources, and when the data sources of the data to be imported provided by the user do not support incremental data importing modes, the client can display a full data importing configuration interface for the user to configure configuration data in the full data importing modes on the interface. After the client obtains the configuration data configured by the user, the configuration data is sent to the data importing server to request the data importing server to import data from the production platform according to the configuration data, and the imported data can be stored in a data warehouse in the large data cluster system.
The data in the production platform can be imported in a full import mode or an incremental import mode. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
The incremental data importing method can continue importing data after the data imported in the previous time, and the data imported before each time is not needed to be imported repeatedly, so that the importing data amount is small, the time consumption is short, and the data importing error is not easy to generate. Therefore, when the incremental data import is supported by the data source, the incremental data import is preferentially used, and when the incremental data import is not supported by the data source, the full-size data import is used.
Configuration data may include, but is not limited to, a data source (e.g., database identification, data table identification, field identification, etc. in the production platform) of the current import data, data import parameters, etc. The data import parameters can determine the data import mode, the number of parallel threads required for parallel import, and the like. It will be appreciated that the configuration data may not include the data import parameters, but only specify the data source, and the data import server determines the data import parameters according to the actual situation of the data source.
Therefore, the client may store the data import method supported by the data source in advance, or may request the data import method of the data source information from the data import server after receiving the data source information of the data import configuration performed by the operation and maintenance personnel.
After receiving a data acquisition request of a client, the data import server extracts configuration data from the data acquisition request, determines data import parameters according to the configuration data, further determines the number of parallel threads for data import at this time based on the data import parameters, and determines word count measurement of data to be imported for each parallel thread according to the number of parallel threads and a total data amount set of a data source.
In an embodiment of the present disclosure, after determining the number of parallel threads and the sub-data amount of the data to be imported by each parallel thread, the data importing server may start the number of parallel threads, where the number of parallel threads may run in parallel, obtain the data to be imported from the data source of the production platform, and store the data imported by each thread in the large data cluster database after being finally merged by the data importing server.
In an implementation manner of an embodiment of the present disclosure, the method further includes the following steps:
the client sends a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;
The data import server determines whether the data source supports a full data import mode according to the data structure design of the data source or the type of the stored service data;
and the data import server sends reply information to the production platform, wherein the reply information comprises an indication of whether the data source supports a full data import mode.
In this optional implementation manner, when the operator performs data configuration, if the data source does not support the incremental import manner, the operator may send a data import manner determining request to the data import server through the client, and the data import server may determine, according to the data source information in the request, a data structure design of the data source or a type of the stored service data, and then determine if the data structure design or the type of the stored service data supports the incremental import manner. The data import server will request to send a reply message to the client based on the data import mode, where the reply message may include an indication of whether the data source supports the incremental data import mode. The operation and maintenance personnel can configure the incremental data import mode or the full data import mode on the client according to the instruction.
In an embodiment of the present disclosure, step S306, that is, the step of determining, by the data import server, the data import parameter according to the configuration data, further includes the steps of:
the data import server determines whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;
when the data import mode is an incremental data import mode, the data import server determines an import starting position of the data source of the data to be imported in the generation platform.
In this alternative implementation, the data in the production platform may be imported by way of full importation or incremental importation. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
Considering that some data sources such as data tables are limited by the design at the beginning or the actual condition of service data can only be accessed in full quantity, the operation and maintenance personnel can specify the data import mode according to the actual condition of the data sources, and of course, it is understood that the operation and maintenance personnel can also not specify the data import mode, but the data import server determines the data sources and the data import modes which can be supported by the data sources according to the data to be imported by the operation and maintenance personnel.
When the operation and maintenance personnel appoints to import the data in an incremental mode, the data import server can determine whether a data source corresponding to the data import supports the incremental import mode, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.
In an embodiment of the present disclosure, step S206, that is, the step of determining, by the data import server, the data import parameter according to the configuration data, further includes the steps of:
the data import server determines parameters in the configuration data that specify the number of parallel threads.
In this optional implementation manner, the number of parallel threads imported by the present data may be specified by the operation and maintenance personnel, that is, the configuration data is configured, if the operation and maintenance personnel configures the number of parallel threads in the configuration data, the data configured by the operation and maintenance personnel is used as a reference, and if the operation and maintenance personnel does not specify the number of parallel threads in the configuration data, the data importing server may use default data as a reference, and determine the number of parallel threads as a default value.
In an embodiment of the present disclosure, step S206, that is, the step of determining, by the data import server, the data import parameter according to the configuration data, further includes the steps of:
the data importing server determines a data segmentation field used for segmenting the data to be imported in the configuration data.
In this alternative implementation, since data in the same data source is imported by multiple parallel threads, each parallel thread extracts a portion of the data from the data source and imports it into the large data cluster data warehouse, and in order to distinguish the portion of the data to be extracted from the data source by each parallel thread, a data splitting field for splitting the data to be imported may be configured in the configuration data by an operator. For example, the operator designates the data division field as "ID" in the data table in the configuration data, and performs the importing by the total importing manner, and the data table includes data with the value of the ID field being 1-100, that is, the data table includes 100 pieces of data, and in the case that the number of parallel threads is 10, each parallel thread needs to import 100/10=10 pieces of data, it may be determined that the 1 st parallel thread extracts data with id=1, 2, … … 10 from the data source, and the 2 nd parallel thread extracts data with id=11, 12, … … 20 from the data source, and so on, and the 10 th parallel thread extracts data with id=91, 92, … … 100 from the data source.
In an embodiment of the present disclosure, step S207, that is, the step of determining, by the data import server, the number of parallel threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameter, further includes the steps of:
and the data import server determines the number of parallel threads as a default value when the parameter for designating the number of parallel threads is empty.
In this alternative implementation, the operator may specify the number of parallel threads that are started by the current data import in the configuration data. If the operation and maintenance personnel configures the parallel thread quantity in the configuration data, the data configured by the operation and maintenance personnel is used as the reference, and if the operation and maintenance personnel does not specify the parallel thread quantity in the configuration data, the parallel thread quantity can be determined as a default value by using default data as the reference. The default value may be set in advance in the data import server.
In an embodiment of the present disclosure, step S207, that is, the step of determining, by the data import server, the number of parallel threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameter, further includes the steps of:
The data importing server determines the total data amount of the data to be imported according to the maximum value and the minimum value of the data dividing field in the data to be imported;
and the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity.
In this optional implementation manner, after the operator formulates the data division field, the data importing server may determine the maximum value and the minimum value of the data division field in the data to be imported first, if the data division field is a full import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data in the corresponding data source, and if the data division field is an incremental import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data after the last data and the last data in the corresponding data source are imported. Typically, the data splitting field is identified with an increasing ID, so that the total data size of the data to be imported can be determined by the maximum value and the minimum value.
And under the condition that the number of parallel threads and the total data quantity are determined, the data import server can determine the sub-data quantity to be imported by each parallel thread by using the way that the number of parallel threads averages the total data quantity.
In an embodiment of the disclosure, the method further comprises the steps of:
when the data source supports incremental data import, the client side displays an incremental data import configuration interface to the user and acquires configuration data from the incremental data import configuration interface.
In this alternative implementation, the data in the production platform may be imported by full import or incremental import. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.
The incremental data importing method can continue importing data after the data imported in the previous time, and the data imported before each time is not needed to be imported repeatedly, so that the importing data amount is small, the time consumption is short, and the data importing error is not easy to generate. Therefore, when the incremental data import is supported by the data source, the incremental data import is preferentially used, and when the incremental data import is not supported by the data source, the full-size data import is used.
Considering that some data sources such as data tables are limited by the design at the beginning or the actual condition of service data can only be accessed in full quantity, the operation and maintenance personnel can specify the data import mode according to the actual condition of the data sources, and of course, it is understood that the operation and maintenance personnel can also not specify the data import mode, but the data import server determines the data sources and the data import modes which can be supported by the data sources according to the data to be imported by the operation and maintenance personnel.
When the operation and maintenance personnel appoints to import the data in an incremental mode, whether a data source corresponding to the data import supports the incremental import mode or not can be determined, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.
In an embodiment of the present disclosure, when the data source supports incremental data import, the client presents the user with an incremental data import configuration interface, and further includes the steps of:
the client determines the last time and/or the last position of the data imported into the data source last time;
And the client displays the last time and/or the last position in the incremental data importing configuration interface.
In this optional implementation manner, when the data source supports the incremental data import manner, the current data import may perform data import on data subsequent to the previous data import, so that the client may locally store information of a last piece of data of the previous data import, or obtain information of a piece of data last imported when the data source is data-imported from the data import server, and then determine a last time and/or a last position when the data source is data-imported from the previous data according to the information of the last piece of data, and then perform incremental data import according to the last time and the last position. Therefore, after the information is determined, the client can display the last time and/or the last position corresponding to the data source when the data is imported last time on the page, and further can enable operation and maintenance personnel to configure the starting position of the data import. The starting position may be determined by the last time and/or the last position described above. For example, if the last time of the last data import is XX, the operation and maintenance personnel can configure to import the data newly added from XX; for another example, if the last position at which the last data was imported is YY, the operator may configure to import data from the yy+1 position.
Fig. 4 illustrates an application scenario diagram of a data access method according to an embodiment of the present disclosure. Fig. 5 shows an overall flowchart of a data access method according to an embodiment of the present disclosure. As shown in fig. 4 and 5, the big data cluster system may include a plurality of clients, which may be used by a plurality of operation maintenance managers, and the data import server may be a virtual machine, which may be composed of a plurality of physical machines. The operation and maintenance personnel can configure the data through the client and submit the configured data to the data import server. When the data needs to be imported, the operation and maintenance personnel can configure information of the data to be imported, such as identification of a data source and the like, through the client, and submit the configuration data to the data importing server. The data importing server can start a plurality of parallel threads according to the configuration data, each thread extracts partial data from one or a plurality of data sources of the production platform, and after the plurality of parallel threads complete data extraction, the extracted data are combined in a large data cluster data warehouse, so that complete data in the data sources can be obtained.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure.
Fig. 6 shows a block diagram of a data access apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data access device includes:
a receiving module 601 configured to receive a data acquisition request; the data acquisition request comprises configuration data of data to be imported;
a first determining module 602 configured to determine data import parameters according to the configuration data;
a second determining module 603 configured to determine, according to the data import parameter, the number of parallel threads of the threads that perform data import and the sub-data amount of the data that each thread needs to import; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
and the parallel execution module 604 is configured to execute the number of the parallel threads in parallel, and the threads acquire data to be imported from the production platform.
The data access device may be adapted to operate in a data import server that extracts data from the production system in a large data cluster system.
Fig. 7 shows a block diagram of a data access apparatus according to another embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 7, the data access device includes:
A response module 701 configured to present a data source configuration interface to a user in response to the detected data acquisition configuration operation;
an obtaining module 702, configured to obtain a data source of data to be imported provided by a user in the data source configuration interface;
the display module 703 is configured to display a full-volume data import configuration interface to the user when the data source does not support the incremental data import mode, and acquire configuration data from the full-volume data import configuration interface;
a sending module 704 configured to send a data acquisition request to a data import server, the data acquisition request comprising the configuration data.
In an embodiment of the present disclosure, the data access device may be adapted to be operated on a client configured by an operation and maintenance person during data extraction from a production system in a large data cluster system.
Fig. 8 illustrates a block diagram of a data access system that may be implemented as part or all of an electronic device by software, hardware, or a combination of both, according to an embodiment of the present disclosure. As shown in fig. 8, the data access system includes: a client 801 and a data import server 802;
the client 801 responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;
The client 801 obtains a data source of data to be imported provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, the client 801 displays a full-volume data import configuration interface to the user, and acquires configuration data from the full-volume data import configuration interface;
the client 801 sends a data acquisition request to a data import server 802, where the data acquisition request includes the configuration data;
the data import server 802 receives a data acquisition request; the data acquisition request comprises configuration data of data to be imported;
the data import server 802 determines data import parameters according to the configuration data;
the data import server 802 determines the number of parallel threads of the threads for importing data and the sub-data amount of the data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;
the data import server 802 executes the number of threads in parallel, and the threads acquire data to be imported from a production platform.
In one embodiment of the present disclosure, the data extraction system is in a large data cluster system for importing data in parallel from a production platform. .
The technical features and the corresponding explanations and descriptions related to the above-mentioned apparatus embodiments are the same, corresponding or similar to the technical features and the corresponding explanations and descriptions related to the above-mentioned method embodiments, and reference may be made to the technical features and the corresponding explanations and descriptions related to the above-mentioned method embodiments for the technical features and the corresponding explanations and descriptions related to the above-mentioned apparatus embodiments, which are not repeated herein.
The embodiment of the disclosure also discloses an electronic device, which comprises a memory and a processor; wherein,
the memory is used to store one or more computer instructions that are executed by the processor to perform any of the method steps described above.
Fig. 9 is a schematic diagram of a computer system suitable for use in implementing a data access method according to an embodiment of the present disclosure.
As shown in fig. 9, the computer system 900 includes a processing unit 901 which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the computer system 900 are also stored. The processing unit 901, the ROM902, and the RAM903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed. The processing unit 901 may be implemented as a processing unit such as CPU, GPU, TPU, FPGA, NPU.
In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data transmission method. In such an embodiment, the computer program may be downloaded and installed from the network through the communication section 909, and/or installed from the removable medium 911.
The disclosed embodiments also disclose a computer program product comprising a computer program/instructions which, when executed by a processor, implement any of the method steps described above.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.
As another aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, which may be a computer-readable storage medium included in the apparatus described in the above-described embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the inventive concept. Such as the technical solution formed by mutually replacing the above-mentioned features and the technical features with similar functions (but not limited to) disclosed in the embodiments of the present disclosure.

Claims (12)

1. A data access method, comprising:
receiving a data acquisition request; the data acquisition request is based on the fact that when operation and maintenance personnel need to import data produced by the production platform, the data acquisition request is provided for a data import server by a client after being configured by the client, and the data acquisition request comprises configuration data of the data to be imported; the configuration data comprises a data segmentation field for segmenting the data to be imported;
determining data import parameters and data import modes according to the configuration data;
determining the number of parallel threads of a thread for data import and the sub-data quantity of data to be imported by each thread according to the data import parameters, determining whether a data source of the production platform supports the incremental import mode or not when the data import mode is the incremental data import mode, and determining the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;
Executing the threads in parallel, and acquiring data to be imported from a production platform by the threads;
the method for determining the number of parallel threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameters comprises the following steps:
determining the total data amount of the data to be imported according to the maximum value and the minimum value of the data segmentation field in the data to be imported;
determining the sub-data quantity of each thread needing to import data according to the data total quantity and the parallel thread quantity;
wherein the method further comprises:
receiving a data importing mode determining request sent by a client; the data importing mode determining request comprises information of the data source;
determining whether the data source supports an incremental data import mode according to the data structure design of the data source or the type of the stored service data;
and sending reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.
2. The method of claim 1, wherein determining the data import parameter from the configuration data comprises:
Parameters in the configuration data are determined for specifying the number of parallel threads.
3. The method of claim 2, wherein determining the number of parallel threads for data importation and the sub-data amount of data each thread needs to importate according to the data importation parameters comprises:
and when the parameter for specifying the number of parallel threads is empty, determining the number of parallel threads as a default value.
4. A data access method, comprising:
the client side responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;
the client acquires a data source of data to be imported, which is provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, the client displays a full data import configuration interface to the user and acquires configuration data from the full data import configuration interface;
the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data; the configuration data comprises a data segmentation field for segmenting the data to be imported;
the data importing server receives a data obtaining request; the data acquisition request comprises configuration data of data to be imported;
The data import server determines data import parameters and data import modes according to the configuration data;
the data import server determines the number of parallel threads of the threads for importing data and the sub-data amount of the data to be imported by each thread according to the data import parameters, determines whether a data source of a production platform supports an incremental import mode or not when the data import mode is the incremental data import mode, and determines the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;
the data import server executes the threads in parallel, and the threads acquire data to be imported from a production platform;
the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameters, and comprises the following steps:
The data importing server determines the total data amount of the data to be imported according to the maximum value and the minimum value of the data dividing field in the data to be imported;
the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity;
the method further comprises the steps of:
the client sends a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;
the data import server determines whether the data source supports a full data import mode according to the data structure design of the data source or the type of the stored service data;
and the data import server sends reply information to the production platform, wherein the reply information comprises an indication of whether the data source supports a full data import mode.
5. The method of claim 4, wherein the data import server determining the data import parameters from the configuration data comprises:
the data import server determines parameters in the configuration data that specify the number of parallel threads.
6. The method of claim 5, wherein the data import server determining the number of parallel threads for data import and the sub-data amount of data each thread needs to import according to the data import parameters, comprising:
and the data import server determines the number of parallel threads as a default value when the parameter for designating the number of parallel threads is empty.
7. The method of any of claims 4-6, wherein the method further comprises:
when the data source supports incremental data import, the client side displays an incremental data import configuration interface to the user and acquires configuration data from the incremental data import configuration interface.
8. The method of claim 7, wherein, when the data source supports incremental data importation, the client presents an incremental data importation configuration interface to the user comprising
The client determines the last time and/or the last position of the data imported into the data source last time;
and the client displays the last time and/or the last position in the incremental data importing configuration interface.
9. A data access device, comprising:
A receiving module configured to receive a data acquisition request; the data acquisition request is based on the fact that when operation and maintenance personnel need to import data produced by the production platform, the data acquisition request is provided for a data import server by a client after being configured by the client, and the data acquisition request comprises configuration data of the data to be imported; the configuration data comprises a data segmentation field for segmenting the data to be imported;
the first determining module is configured to determine a data import parameter and a data import mode according to the configuration data;
the second determining module is configured to determine the number of parallel threads of the threads for data import and the sub-data amount of data to be imported for each thread according to the data import parameters, determine whether the data source of the production platform supports the incremental import mode when the data import mode is the incremental data import mode, and determine the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;
The parallel execution module is configured to execute the number of the threads in parallel, and the threads acquire data to be imported from a production platform;
the determining, in the second determining module, the number of parallel threads performing data import according to the data import parameter and the sub-data amount of data to be imported by each thread are implemented as follows:
determining the total data amount of the data to be imported according to the maximum value and the minimum value of the data segmentation field in the data to be imported;
determining the sub-data quantity of each thread needing to import data according to the data total quantity and the parallel thread quantity;
wherein the apparatus further comprises:
the request receiving module is configured to receive a data importing mode determining request sent by the client; the data importing mode determining request comprises information of the data source;
the mode determining module is configured to determine whether the data source supports an incremental data importing mode according to the data structure design of the data source or the type of the stored service data;
and the reply module is configured to send reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.
10. A data access system, comprising: a client and a data importing server;
the client side responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;
the client acquires a data source of data to be imported, which is provided by a user in the data source configuration interface;
when the data source does not support the incremental data import mode, the client displays a full data import configuration interface to the user and acquires configuration data from the full data import configuration interface;
the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data; the configuration data comprises a data segmentation field for segmenting the data to be imported;
the data importing server receives a data obtaining request; the data acquisition request comprises configuration data of data to be imported;
the data import server determines data import parameters and data import modes according to the configuration data;
the data import server determines the number of parallel threads of the threads for importing data and the sub-data amount of the data to be imported by each thread according to the data import parameters, determines whether a data source of a production platform supports an incremental import mode or not when the data import mode is the incremental data import mode, and determines the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;
The data import server executes the threads in parallel, and the threads acquire data to be imported from a production platform;
the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameters, and comprises the following steps:
the data importing server determines the total data amount of the data to be imported according to the maximum value and the minimum value of the data dividing field in the data to be imported;
the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity;
the client side also sends a data importing mode determining request to the data importing server; the data importing mode determining request comprises information of the data source;
the data import server determines whether the data source supports a full data import mode according to the data structure design of the data source or the type of the stored service data;
and the data import server sends reply information to the production platform, wherein the reply information comprises an indication of whether the data source supports a full data import mode.
11. An electronic device includes a memory and a processor; wherein,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement the steps of the method of any one of claims 1-8.
12. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, implement the steps of the method of any of claims 1-8.
CN202110096250.XA 2021-01-25 2021-01-25 Data access method, device, electronic equipment and storage medium Active CN112685427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110096250.XA CN112685427B (en) 2021-01-25 2021-01-25 Data access method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110096250.XA CN112685427B (en) 2021-01-25 2021-01-25 Data access method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112685427A CN112685427A (en) 2021-04-20
CN112685427B true CN112685427B (en) 2024-03-26

Family

ID=75459207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110096250.XA Active CN112685427B (en) 2021-01-25 2021-01-25 Data access method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112685427B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786603A (en) * 2016-02-29 2016-07-20 青岛海尔智能家电科技有限公司 High-concurrency service processing system and method based on distributed mode
CN105930945A (en) * 2015-12-30 2016-09-07 中国银联股份有限公司 Business processing method and apparatus
CN108228730A (en) * 2017-12-11 2018-06-29 深圳市买买提信息科技有限公司 Data lead-in method, device, computer equipment and readable storage medium storing program for executing
CN108776710A (en) * 2018-06-28 2018-11-09 农信银资金清算中心有限责任公司 A kind of concurrent stowage and device of database data
CN109101330A (en) * 2018-08-06 2018-12-28 百度在线网络技术(北京)有限公司 Data capture method, device and system
CN109726174A (en) * 2018-12-28 2019-05-07 江苏满运软件科技有限公司 Data archiving method, system, equipment and storage medium
CN110019339A (en) * 2017-11-20 2019-07-16 北京京东尚科信息技术有限公司 A kind of data query method and system
CN110334018A (en) * 2019-06-18 2019-10-15 梁俊杰 A kind of big data introduction method and relevant device
CN110795495A (en) * 2018-07-17 2020-02-14 北京京东尚科信息技术有限公司 Data processing method and device, electronic equipment and computer readable medium
CN111159191A (en) * 2019-12-30 2020-05-15 深圳博沃智慧科技有限公司 Data processing method, device and interface
CN111444149A (en) * 2020-04-20 2020-07-24 北京同心尚科技发展有限公司 Data import method, device, equipment and storage medium
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium
US10885023B1 (en) * 2014-09-08 2021-01-05 Amazon Technologies, Inc. Asynchronous processing for synchronous requests in a database

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090182718A1 (en) * 2007-05-08 2009-07-16 Digital River, Inc. Remote Segmentation System and Method Applied To A Segmentation Data Mart
US9292702B2 (en) * 2009-08-20 2016-03-22 International Business Machines Corporation Dynamic switching of security configurations
US9684684B2 (en) * 2014-07-08 2017-06-20 Sybase, Inc. Index updates using parallel and hybrid execution
US10003634B2 (en) * 2016-05-14 2018-06-19 Richard Banister Multi-threaded download with asynchronous writing

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10885023B1 (en) * 2014-09-08 2021-01-05 Amazon Technologies, Inc. Asynchronous processing for synchronous requests in a database
CN105930945A (en) * 2015-12-30 2016-09-07 中国银联股份有限公司 Business processing method and apparatus
CN105786603A (en) * 2016-02-29 2016-07-20 青岛海尔智能家电科技有限公司 High-concurrency service processing system and method based on distributed mode
CN110019339A (en) * 2017-11-20 2019-07-16 北京京东尚科信息技术有限公司 A kind of data query method and system
CN108228730A (en) * 2017-12-11 2018-06-29 深圳市买买提信息科技有限公司 Data lead-in method, device, computer equipment and readable storage medium storing program for executing
CN108776710A (en) * 2018-06-28 2018-11-09 农信银资金清算中心有限责任公司 A kind of concurrent stowage and device of database data
CN110795495A (en) * 2018-07-17 2020-02-14 北京京东尚科信息技术有限公司 Data processing method and device, electronic equipment and computer readable medium
CN109101330A (en) * 2018-08-06 2018-12-28 百度在线网络技术(北京)有限公司 Data capture method, device and system
CN109726174A (en) * 2018-12-28 2019-05-07 江苏满运软件科技有限公司 Data archiving method, system, equipment and storage medium
CN110334018A (en) * 2019-06-18 2019-10-15 梁俊杰 A kind of big data introduction method and relevant device
CN111159191A (en) * 2019-12-30 2020-05-15 深圳博沃智慧科技有限公司 Data processing method, device and interface
CN111444149A (en) * 2020-04-20 2020-07-24 北京同心尚科技发展有限公司 Data import method, device, equipment and storage medium
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ETL中的数据增量抽取机制研究;戴浩等;计算机工程与设计;第30卷(第23期);第5552-5555页 *
Extracting delta for incremental data warehouse maintenance;P. Ram;IEEE;第1-10页 *
电网大规模数据仓库的数据接入研究与设计;李子乾等;计算机应用与软件(第8期);第186-191页 *
陈明.《大数据技术概论》.中国铁道出版社,2019,第103-105页. *

Also Published As

Publication number Publication date
CN112685427A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN107453960B (en) Method, device and system for processing test data in service test
CN112380473B (en) Data acquisition and synchronization method, device, equipment and storage medium
CN112749219A (en) Data extraction method, data extraction device, electronic equipment, storage medium and program product
CN108596587B (en) Cash-up auditing method, apparatus, electronic device, program product and storage medium
CN112882863A (en) Method, device and system for recovering data and electronic equipment
CN111400189A (en) Code coverage rate monitoring method and device, electronic equipment and storage medium
CN111125057A (en) Service request processing method and device and computer system
CN112100070A (en) Version defect detection method and device, server and storage medium
CN105955838A (en) System halt reason check method and device
CN113886455A (en) Global unique serial number generation method and device, electronic equipment and storage medium
CN107153679B (en) Extraction statistical method and system for semi-structured big data
CN112685427B (en) Data access method, device, electronic equipment and storage medium
CN108363671B (en) Interface switching method, terminal equipment and storage medium
CN112883050B (en) Data changing method and device of database
CN115016754A (en) Method and device for synchronously displaying pages among devices, electronic device and medium
CN111459737B (en) Problem positioning method, device, computer equipment and storage medium
CN110955597B (en) Object testing method and device, electronic equipment and computer readable storage medium
CN112612674A (en) Method, device, equipment and computer readable storage medium for monitoring buried point data
CN113205413B (en) Mobile phone bank data processing method and device
CN116501585A (en) Log processing method, electronic equipment and log processing system
CN116011437A (en) Message theme processing method and device and computer readable medium
CN117667454A (en) Metadata acquisition method, device, equipment and medium
CN116302652A (en) System alarm information processing method and device and electronic equipment
CN112667726A (en) Data extraction method, data extraction device, electronic equipment, storage medium and program product
CN114048058A (en) Live event searching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant