CN112685427B

CN112685427B - Data access method, device, electronic equipment and storage medium

Info

Publication number: CN112685427B
Application number: CN202110096250.XA
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Lakala Payment Co ltd
Current assignee: Lakala Payment Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2024-03-26
Anticipated expiration: 2041-01-25
Also published as: CN112685427A

Abstract

The embodiment of the disclosure discloses a data access method, a device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: receiving a data acquisition request; the data acquisition request comprises configuration data of data to be imported; determining data import parameters according to the configuration data; determining the number of parallel threads of the threads for data import and the sub-data quantity of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; and executing the threads in parallel, and acquiring data to be imported from a production platform by the threads. The technical scheme can accelerate the process of the data extraction process with larger data access quantity, so as to reduce the probability of the database snapshot overage by improving the data extraction speed, thereby greatly improving the access success rate of the data table with large data quantity and reducing the influence of the access failure on downstream business.

Description

Data access method, device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a data access method, a data access device, electronic equipment and a storage medium.

Background

The big data cluster system generally can intensively store the data of the production system, and is convenient to manage and use in the downstream. However, there is a problem in that the data table storing data in the production system is excessively large in data amount, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in the system, namely, a part of the data table in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, and some data tables can be limited by the actual situation of the original design or service data, only full-energy access can be realized during data extraction, the incremental access cannot be realized, and failure can easily occur when the data extraction amount exceeds tens of millions. The inventors of the present disclosure found that one reason for access failure is that database snapshots are too old, and in the case where these access data tables are relatively important service tables, serious service impact is easily caused to downstream services (report, lubrication, accounting, etc.) if access fails. Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.

Disclosure of Invention

The embodiment of the disclosure provides a data access method, a data access device, electronic equipment, a storage medium and a program product.

In a first aspect, an embodiment of the present disclosure provides a data access method, including:

receiving a data acquisition request; the data acquisition request comprises configuration data of data to be imported;

determining data import parameters according to the configuration data;

determining the number of parallel threads of the threads for data import and the sub-data quantity of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

and executing the threads in parallel, and acquiring data to be imported from a production platform by the threads.

Further, determining the data import parameter according to the configuration data includes:

determining whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;

and when the data import mode is an incremental data import mode, determining an import starting position of the data source of the data to be imported in the generation platform.

parameters in the configuration data are determined for specifying the number of parallel threads.

Further, determining the data import parameter according to the configuration data further includes:

and determining a data segmentation field used for segmenting the data to be imported in the configuration data.

Further, determining the number of parallel threads for data import and the sub-data amount of data to be imported for each thread according to the data import parameters, including:

and when the parameter for specifying the number of parallel threads is empty, determining the number of parallel threads as a default value.

determining the total data amount of the data to be imported according to the maximum value and the minimum value of the data segmentation field in the data to be imported;

and determining the sub-data quantity of the data to be imported by each thread according to the data quantity and the parallel thread quantity.

Further, the method further comprises:

receiving a data importing mode determining request sent by a client; the data importing mode determining request comprises information of the data source;

Determining whether the data source supports an incremental data import mode according to the data structure design of the data source or the type of the stored service data;

and sending reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.

In a second aspect, an embodiment of the present disclosure provides a data access method, including:

responding to the detected data acquisition configuration operation, and displaying a data source configuration interface to a user;

acquiring a data source of data to be imported, which is provided by a user in the data source configuration interface;

when the data source does not support the incremental data import mode, displaying a full-quantity data import configuration interface to the user, and acquiring configuration data from the full-quantity data import configuration interface;

and sending a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data.

Further, the method further comprises:

sending a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;

and receiving reply information of the data import server, wherein the reply information comprises an indication of whether the data source supports full data import.

Further, the method further comprises:

when the data source supports incremental data import, an incremental data import configuration interface is displayed for the user, and configuration data is obtained from the incremental data import configuration interface.

Further, when the data source supports incremental data import, presenting an incremental data import configuration interface to the user, comprising:

determining the last time and/or the last position of the data imported into the data source last time;

and displaying the last time and/or the last position in the incremental data importing configuration interface.

In a third aspect, an embodiment of the present disclosure provides a data access method, including:

the client side responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;

the client acquires a data source of data to be imported, which is provided by a user in the data source configuration interface;

when the data source does not support the incremental data import mode, the client displays a full data import configuration interface to the user and acquires configuration data from the full data import configuration interface;

the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data;

The data importing server receives a data obtaining request; the data acquisition request comprises configuration data of data to be imported;

the data import server determines data import parameters according to the configuration data;

the data import server determines the number of parallel threads of the threads for importing data and the sub-data quantity of the data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

and the data import server executes the threads in parallel, and the threads acquire data to be imported from the production platform.

Further, the method further comprises:

the client sends a data import mode determining request to the data import server; the data importing mode determining request comprises information of the data source;

the data import server determines whether the data source supports a full data import mode according to the data structure design of the data source or the type of the stored service data;

and the data import server sends reply information to the production platform, wherein the reply information comprises an indication of whether the data source supports a full data import mode.

Further, the data import server determines the data import parameters according to the configuration data, including:

the data import server determines whether the data import mode is a full data import mode or an incremental data import mode according to the configuration data;

when the data import mode is an incremental data import mode, the data import server determines an import starting position of the data source of the data to be imported in the generation platform.

the data import server determines parameters in the configuration data that specify the number of parallel threads.

Further, the data import server determines the data import parameters according to the configuration data, and further includes:

the data importing server determines a data segmentation field used for segmenting the data to be imported in the configuration data.

Further, the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameter, including:

And the data import server determines the number of parallel threads as a default value when the parameter for designating the number of parallel threads is empty.

the data importing server determines the total data amount of the data to be imported according to the maximum value and the minimum value of the data dividing field in the data to be imported;

and the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity.

Further, the method further comprises:

when the data source supports incremental data import, the client side displays an incremental data import configuration interface to the user and acquires configuration data from the incremental data import configuration interface.

Further, when the data source supports incremental data import, the client presents an incremental data import configuration interface to the user, including

The client determines the last time and/or the last position of the data imported into the data source last time;

And the client displays the last time and/or the last position in the incremental data importing configuration interface.

In a fourth aspect, in an embodiment of the present disclosure, there is provided a data access apparatus, including:

a receiving module configured to receive a data acquisition request; the data acquisition request comprises configuration data of data to be imported;

a first determining module configured to determine a data import parameter according to the configuration data;

the second determining module is configured to determine the number of parallel threads of the threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

and the parallel execution module is configured to execute the number of the threads in parallel, and the threads acquire data to be imported from the production platform.

In a fifth aspect, in an embodiment of the present disclosure, there is provided a data access apparatus, including:

the response module is configured to respond to the detected data acquisition configuration operation and display a data source configuration interface to a user;

the acquisition module is configured to acquire a data source of data to be imported, which is provided by a user in the data source configuration interface;

The display module is configured to display a full data import configuration interface to the user when the data source does not support the incremental data import mode, and acquire configuration data from the full data import configuration interface;

and the sending module is configured to send a data acquisition request to the data import server, wherein the data acquisition request comprises the configuration data.

The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.

In one possible design, the structure of the above apparatus includes a memory for storing one or more computer instructions for supporting the above apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may further comprise a communication interface for the apparatus to communicate with other devices or a communication network.

In a sixth aspect, in an embodiment of the present disclosure, there is provided a data access system, including: a client and a data importing server;

In a seventh aspect, embodiments of the present disclosure provide an electronic device comprising a memory for storing one or more computer instructions supporting any of the apparatus for performing the corresponding method described above, and a processor configured to execute the computer instructions stored in the memory. Any of the above-described apparatuses may further include a communication interface for communicating with other devices or a communication network.

In an eighth aspect, embodiments of the present disclosure provide a computer-readable storage medium storing computer instructions for use by any one of the above-described apparatuses, including computer instructions for performing any one of the above-described methods.

In a ninth aspect, embodiments of the present disclosure provide a computer program product comprising computer instructions for implementing the steps of the method of any one of the above aspects when executed by a processor.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

in the technical scheme provided by the embodiment of the disclosure, in the process of extracting different service data from a production system to a large data cluster system for centralized storage, configuration is performed by an operation and maintenance personnel through a client, a server splits a data extraction process with larger data access amount into a plurality of, for example, n data extraction processes which are executed concurrently according to configuration data, one part of the data extraction processes is extracted in each data extraction process, the sum of the data amounts extracted by the n data extraction processes executed concurrently is the total amount of data to be extracted, and the time spent for data extraction is only one n times of the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.

Drawings

Other features, objects and advantages of the embodiments of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:

fig. 1 shows a flow chart of a data access method according to an embodiment of the present disclosure;

fig. 2 shows a flow chart of a data access method according to another embodiment of the present disclosure;

fig. 3 shows a flow chart of a data access method according to another embodiment of the present disclosure;

fig. 4 illustrates an application scenario diagram of a data access method according to an embodiment of the present disclosure;

fig. 5 shows an overall flowchart of a data access method according to an embodiment of the present disclosure;

fig. 6 shows a block diagram of a data access device according to an embodiment of the present disclosure;

fig. 7 shows a block diagram of a data access system according to another embodiment of the present disclosure;

fig. 8 shows a block diagram of a data access system according to another embodiment of the present disclosure;

fig. 9 is a schematic diagram of a computer system suitable for use in implementing a data access method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary implementations of the embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.

In the presently disclosed embodiments, it is to be understood that the terms such as "comprises" or "comprising" and the like are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in the present specification, and are not intended to exclude the possibility of one or more other features, numbers, steps, acts, components, portions, or combinations thereof being present or added.

In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. Embodiments of the present disclosure will be described in detail below with reference to the attached drawings in conjunction with the embodiments.

Fig. 1 shows a flowchart of a data access method according to an embodiment of the present disclosure, as shown in fig. 1, including the following steps S101-S104:

in step S101, a data acquisition request is received; the data acquisition request comprises configuration data of data to be imported;

in step S102, determining a data import parameter according to the configuration data;

in step S103, determining the number of parallel threads of the threads for data import and the sub-data amount of the data to be imported for each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

in step S104, the number of threads in parallel is executed, and the threads acquire data to be imported from the production platform.

The big data cluster system needs to store the data of the production system in a centralized way, so that the management is convenient and the data is used for downstream. In the production system, there are cases where the data amount of many data tables storing data is too large, such as a merchant table, a terminal table, etc., and the data amount in one data table may exceed ten millions. Another problem exists in generating a data table for storing data in a system, namely, a part of the data tables in the production system can be acquired through incremental access, so that the data extraction amount of each time can be reduced, some data tables can be limited by the actual situation of design or service data at the beginning, only full access is possible during data extraction, access with the data amount exceeding tens of millions of times is easy to fail, for example, the most common access failure cause can be that the database snapshot is too old, and the data tables can be important service tables, and if the access fails, serious service influence is easily caused on downstream services (report, moistening, accounting and the like). Therefore, how to avoid the problem of failure caused by the overlarge amount of one data access in the data access process is one of the technical problems to be solved at present.

In view of the above, in this embodiment, a data access method is proposed, in which, in a process of extracting different service data from a production system into a large data cluster system for centralized storage, configuration is performed by an operation and maintenance person through a client, a server splits a data extraction process with a large data access amount into a plurality of, for example, n concurrently executed data extraction processes according to configuration data, one of the data extraction processes is extracted, and the sum of the data amounts extracted by the n concurrently executed data extraction processes is the total amount of data to be extracted, while the time taken for data extraction is only one-n times the original data extraction process. By the method, the process of the data extraction process with larger data access quantity can be accelerated, so that the probability of the database snapshot overage is reduced by improving the data extraction speed, the access success rate of a data table with large data quantity is greatly improved, and the influence of the access failure on downstream business is reduced.

In one embodiment of the present disclosure, the data access method may be adapted to operate in a data import server that extracts data from a production system in a large data cluster system.

In an embodiment of the disclosure, when the operator needs to import the data generated by the generating platform, the operator may configure the data through the client and provide the configuration data to the data importing server, so as to request the data importing server to import corresponding data from the generating platform according to the configuration data. Configuration data may include, but is not limited to, a data source (e.g., database identification, data table identification, field identification, etc. in the production platform) of the current import data, data import parameters, etc. The data import parameters can determine the data import mode, the number of parallel threads required for parallel import, and the like. It will be appreciated that the configuration data may not include the data import parameters, but only specify the data source, and the data import server determines the data import parameters according to the actual situation of the data source.

After receiving a data acquisition request of a client, the data import server extracts configuration data from the data acquisition request, determines data import parameters according to the configuration data, further determines the number of parallel threads for data import at this time based on the data import parameters, and determines word count measurement of data to be imported for each parallel thread according to the number of parallel threads and a total data amount set of a data source.

In an embodiment of the present disclosure, after determining the number of parallel threads and the sub-data amount of the data to be imported by each parallel thread, the number of parallel threads may be started, where the number of parallel threads may run in parallel, and the data to be imported may be obtained from the data source of the production platform, respectively, and after being finally merged by the data importing server, the data imported by each thread is stored in the big data cluster database.

In an embodiment of the present disclosure, step S102, that is, the step of determining the data import parameter according to the configuration data, further includes the steps of:

In this alternative implementation, the data in the production platform may be imported by way of full importation or incremental importation. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.

Considering that some data sources such as data tables are limited by the design at the beginning or the actual condition of service data can only be accessed in full quantity, the operation and maintenance personnel can specify the data import mode according to the actual condition of the data sources, and of course, it is understood that the operation and maintenance personnel can also not specify the data import mode, but the data import server determines the data sources and the data import modes which can be supported by the data sources according to the data to be imported by the operation and maintenance personnel.

When the operation and maintenance personnel appoints to import the data in an incremental mode, whether a data source corresponding to the data import supports the incremental import mode or not can be determined, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.

In this optional implementation manner, the number of parallel threads imported by the present data may be specified by the operation and maintenance personnel, that is, the configuration data is configured, if the operation and maintenance personnel configures the number of parallel threads in the configuration data, the data configured by the operation and maintenance personnel is used as a reference, and if the operation and maintenance personnel does not specify the number of parallel threads in the configuration data, the number of parallel threads may be determined as a default value based on default data.

In this alternative implementation, since data in the same data source is imported by multiple parallel threads, each parallel thread extracts a portion of the data from the data source and imports it into the large data cluster data warehouse, and in order to distinguish the portion of the data to be extracted from the data source by each parallel thread, a data splitting field for splitting the data to be imported may be configured in the configuration data by an operator. For example, the operator designates the data division field as "ID" in the data table in the configuration data, and performs the importing by the total importing manner, and the data table includes data with the value of the ID field being 1-100, that is, the data table includes 100 pieces of data, and in the case that the number of parallel threads is 10, each parallel thread needs to import 100/10=10 pieces of data, it may be determined that the 1 st parallel thread extracts data with id=1, 2, … … 10 from the data source, and the 2 nd parallel thread extracts data with id=11, 12, … … 20 from the data source, and so on, and the 10 th parallel thread extracts data with id=91, 92, … … 100 from the data source.

In an embodiment of the present disclosure, step S103, that is, determining, according to the data import parameter, the number of parallel threads for importing data and the sub-data amount of data that each thread needs to import, further includes the following steps:

In this alternative implementation, the operator may specify the number of parallel threads that are started by the current data import in the configuration data. If the operation and maintenance personnel configures the parallel thread quantity in the configuration data, the data configured by the operation and maintenance personnel is used as the reference, and if the operation and maintenance personnel does not specify the parallel thread quantity in the configuration data, the parallel thread quantity can be determined as a default value by using default data as the reference. The default value may be set in advance in the data import server.

In this optional implementation manner, after the operator formulates the data division field, the data importing server may determine the maximum value and the minimum value of the data division field in the data to be imported first, if the data division field is a full import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data in the corresponding data source, and if the data division field is an incremental import manner, the maximum value and the minimum value are the values of the last 1 data and the 1 st data after the last data and the last data in the corresponding data source are imported. Typically, the data splitting field is identified with an increasing ID, so that the total data size of the data to be imported can be determined by the maximum value and the minimum value.

And under the condition that the number of parallel threads and the total data quantity are determined, the sub-data quantity which needs to be imported by each parallel thread can be determined by utilizing the mode that the number of parallel threads averages the total data quantity.

In an embodiment of the disclosure, the method further comprises the steps of:

In this optional implementation manner, when the operator performs data configuration, when it is not determined whether the data source supports the incremental import manner, a data import manner determination request may be sent to the data import server, and the data import server may determine, according to the data source information in the request, a data structure design of the data source or a type of the stored service data, and then determine whether the data structure design or the type of the stored service data supports the incremental import manner. The data import server will request to send a reply message to the client based on the data import mode, where the reply message may include an indication of whether the data source supports the incremental data import mode. The operation and maintenance personnel can configure whether the data import adopts an incremental data import mode or a full data import mode according to the instruction.

Fig. 2 shows a flowchart of a data access method according to another embodiment of the present disclosure, which includes the following steps S201 to S204, as shown in fig. 2:

In step S201, a data source configuration interface is presented to a user in response to the detected data acquisition configuration operation;

in step S202, a data source of data to be imported, which is provided by a user in the data source configuration interface, is obtained;

in step S203, when the data source does not support the incremental data import manner, displaying a full-scale data import configuration interface to the user, and acquiring configuration data from the full-scale data import configuration interface;

in step S204, a data acquisition request is sent to the data import server, the data acquisition request including the configuration data.

In an embodiment of the present disclosure, the data access method may be adapted to run on a client configured by an operation and maintenance person during data extraction from a production system in a large data cluster system.

The operation and maintenance personnel can perform configuration operation through a configuration operation interface provided on the client, after the client detects the data acquisition configuration operation of the operation and maintenance personnel, the client displays a data source configuration interface for a user, the user can provide data sources of data to be imported on the data source configuration interface, the client can prestore data importing modes supported by the data sources, and when the data sources of the data to be imported provided by the user do not support incremental data importing modes, the client can display a full data importing configuration interface for the user to configure configuration data in the full data importing modes on the interface. After the client obtains the configuration data configured by the user, the configuration data is sent to the data importing server to request the data importing server to import data from the production platform according to the configuration data, and the imported data can be stored in a data warehouse in the large data cluster system.

The data in the production platform can be imported in a full import mode or an incremental import mode. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.

The incremental data importing method can continue importing data after the data imported in the previous time, and the data imported before each time is not needed to be imported repeatedly, so that the importing data amount is small, the time consumption is short, and the data importing error is not easy to generate. Therefore, when the incremental data import is supported by the data source, the incremental data import is preferentially used, and when the incremental data import is not supported by the data source, the full-size data import is used.

Configuration data may include, but is not limited to, a data source (e.g., database identification, data table identification, field identification, etc. in the production platform) of the current import data, data import parameters, etc. The data import parameters can determine the data import mode, the number of parallel threads required for parallel import, and the like. It will be appreciated that the configuration data may not include the data import parameters, but only specify the data source, and the data import server determines the data import parameters according to the actual situation of the data source.

Therefore, the client may store the data import method supported by the data source in advance, or may request the data import method of the data source information from the data import server after receiving the data source information of the data import configuration performed by the operation and maintenance personnel.

In an embodiment of the present disclosure, after determining the number of parallel threads and the sub-data amount of the data to be imported by each parallel thread, the data importing server may start the number of parallel threads, where the number of parallel threads may run in parallel, obtain the data to be imported from the data source of the production platform, and store the data imported by each thread in the large data cluster database after being finally merged by the data importing server.

In an implementation manner of an embodiment of the present disclosure, the method further includes the following steps:

In an implementation manner of an embodiment of the present disclosure, when the data source supports incremental data import, the step of displaying an incremental data import configuration interface to the user further includes the following steps:

In this optional implementation manner, when the data source supports the incremental data import manner, the current data import may perform data import on data subsequent to the previous data import, so that the client may locally store information of a last piece of data of the previous data import, or obtain information of a piece of data last imported when the data source is data-imported from the data import server, and then determine a last time and/or a last position when the data source is data-imported from the previous data according to the information of the last piece of data, and then perform incremental data import according to the last time and the last position. Therefore, after the information is determined, the client can display the last time and/or the last position corresponding to the data source when the data is imported last time on the page, and further can enable operation and maintenance personnel to configure the starting position of the data import. The starting position may be determined by the last time and/or the last position described above. For example, if the last time of the last data import is XX, the operation and maintenance personnel can configure to import the data newly added from XX; for another example, if the last position at which the last data was imported is YY, the operator may configure to import data from the yy+1 position.

The technical terms and features of the embodiment shown in fig. 2 and related thereto are the same as or similar to those mentioned in the embodiment shown in fig. 1 and related thereto, and the explanation and description of the technical terms and features of the embodiment shown in fig. 2 and related thereto will be referred to the explanation of the embodiment shown in fig. 1 and related thereto, and will not be repeated here.

Fig. 3 shows a flowchart of a data access method according to another embodiment of the present disclosure, which includes the following steps S301 to S308, as shown in fig. 3:

in step S301, the client side presents a data source configuration interface to the user in response to the detected data acquisition configuration operation;

in step S302, the client acquires a data source of data to be imported provided by a user in the data source configuration interface;

in step S303, when the data source does not support the incremental data import manner, the client displays a full-volume data import configuration interface to the user, and obtains configuration data from the full-volume data import configuration interface;

in step S304, the client sends a data acquisition request to a data import server, where the data acquisition request includes the configuration data;

In step S305, the data import server receives a data acquisition request; the data acquisition request comprises configuration data of data to be imported;

in step S306, the data import server determines data import parameters according to the configuration data;

in step S307, the data import server determines, according to the data import parameter, the number of parallel threads of the threads that perform data import and the sub-data amount of the data that each thread needs to import; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

in step S308, the data import server executes the number of threads in parallel, and the threads obtain data to be imported from the production platform.

In an embodiment of the present disclosure, the data access method may be adapted to operate in a large data cluster system for parallel importing data from a production platform.

In this optional implementation manner, when the operator performs data configuration, if the data source does not support the incremental import manner, the operator may send a data import manner determining request to the data import server through the client, and the data import server may determine, according to the data source information in the request, a data structure design of the data source or a type of the stored service data, and then determine if the data structure design or the type of the stored service data supports the incremental import manner. The data import server will request to send a reply message to the client based on the data import mode, where the reply message may include an indication of whether the data source supports the incremental data import mode. The operation and maintenance personnel can configure the incremental data import mode or the full data import mode on the client according to the instruction.

In an embodiment of the present disclosure, step S306, that is, the step of determining, by the data import server, the data import parameter according to the configuration data, further includes the steps of:

When the operation and maintenance personnel appoints to import the data in an incremental mode, the data import server can determine whether a data source corresponding to the data import supports the incremental import mode, and if so, the starting position of the data imported at the present time is determined. The start position may be specified by the operator in the configuration data or may be determined by the data import server based on the end position of the last import of data from the data source.

In an embodiment of the present disclosure, step S206, that is, the step of determining, by the data import server, the data import parameter according to the configuration data, further includes the steps of:

In this optional implementation manner, the number of parallel threads imported by the present data may be specified by the operation and maintenance personnel, that is, the configuration data is configured, if the operation and maintenance personnel configures the number of parallel threads in the configuration data, the data configured by the operation and maintenance personnel is used as a reference, and if the operation and maintenance personnel does not specify the number of parallel threads in the configuration data, the data importing server may use default data as a reference, and determine the number of parallel threads as a default value.

In an embodiment of the present disclosure, step S207, that is, the step of determining, by the data import server, the number of parallel threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameter, further includes the steps of:

And under the condition that the number of parallel threads and the total data quantity are determined, the data import server can determine the sub-data quantity to be imported by each parallel thread by using the way that the number of parallel threads averages the total data quantity.

In an embodiment of the disclosure, the method further comprises the steps of:

In this alternative implementation, the data in the production platform may be imported by full import or incremental import. The full-scale import method is understood to be a method of importing data from the same data source all at once, and the incremental import method is understood to be a method of importing data from the same data source multiple times, and data can be imported after the previous data import when the data is imported later.

In an embodiment of the present disclosure, when the data source supports incremental data import, the client presents the user with an incremental data import configuration interface, and further includes the steps of:

Fig. 4 illustrates an application scenario diagram of a data access method according to an embodiment of the present disclosure. Fig. 5 shows an overall flowchart of a data access method according to an embodiment of the present disclosure. As shown in fig. 4 and 5, the big data cluster system may include a plurality of clients, which may be used by a plurality of operation maintenance managers, and the data import server may be a virtual machine, which may be composed of a plurality of physical machines. The operation and maintenance personnel can configure the data through the client and submit the configured data to the data import server. When the data needs to be imported, the operation and maintenance personnel can configure information of the data to be imported, such as identification of a data source and the like, through the client, and submit the configuration data to the data importing server. The data importing server can start a plurality of parallel threads according to the configuration data, each thread extracts partial data from one or a plurality of data sources of the production platform, and after the plurality of parallel threads complete data extraction, the extracted data are combined in a large data cluster data warehouse, so that complete data in the data sources can be obtained.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure.

Fig. 6 shows a block diagram of a data access apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data access device includes:

a receiving module 601 configured to receive a data acquisition request; the data acquisition request comprises configuration data of data to be imported;

a first determining module 602 configured to determine data import parameters according to the configuration data;

a second determining module 603 configured to determine, according to the data import parameter, the number of parallel threads of the threads that perform data import and the sub-data amount of the data that each thread needs to import; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

and the parallel execution module 604 is configured to execute the number of the parallel threads in parallel, and the threads acquire data to be imported from the production platform.

The data access device may be adapted to operate in a data import server that extracts data from the production system in a large data cluster system.

Fig. 7 shows a block diagram of a data access apparatus according to another embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 7, the data access device includes:

A response module 701 configured to present a data source configuration interface to a user in response to the detected data acquisition configuration operation;

an obtaining module 702, configured to obtain a data source of data to be imported provided by a user in the data source configuration interface;

the display module 703 is configured to display a full-volume data import configuration interface to the user when the data source does not support the incremental data import mode, and acquire configuration data from the full-volume data import configuration interface;

a sending module 704 configured to send a data acquisition request to a data import server, the data acquisition request comprising the configuration data.

In an embodiment of the present disclosure, the data access device may be adapted to be operated on a client configured by an operation and maintenance person during data extraction from a production system in a large data cluster system.

Fig. 8 illustrates a block diagram of a data access system that may be implemented as part or all of an electronic device by software, hardware, or a combination of both, according to an embodiment of the present disclosure. As shown in fig. 8, the data access system includes: a client 801 and a data import server 802;

the client 801 responds to the detected data acquisition configuration operation and displays a data source configuration interface to a user;

The client 801 obtains a data source of data to be imported provided by a user in the data source configuration interface;

when the data source does not support the incremental data import mode, the client 801 displays a full-volume data import configuration interface to the user, and acquires configuration data from the full-volume data import configuration interface;

the client 801 sends a data acquisition request to a data import server 802, where the data acquisition request includes the configuration data;

the data import server 802 receives a data acquisition request; the data acquisition request comprises configuration data of data to be imported;

the data import server 802 determines data import parameters according to the configuration data;

the data import server 802 determines the number of parallel threads of the threads for importing data and the sub-data amount of the data to be imported by each thread according to the data import parameters; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported;

the data import server 802 executes the number of threads in parallel, and the threads acquire data to be imported from a production platform.

In one embodiment of the present disclosure, the data extraction system is in a large data cluster system for importing data in parallel from a production platform. .

The technical features and the corresponding explanations and descriptions related to the above-mentioned apparatus embodiments are the same, corresponding or similar to the technical features and the corresponding explanations and descriptions related to the above-mentioned method embodiments, and reference may be made to the technical features and the corresponding explanations and descriptions related to the above-mentioned method embodiments for the technical features and the corresponding explanations and descriptions related to the above-mentioned apparatus embodiments, which are not repeated herein.

The embodiment of the disclosure also discloses an electronic device, which comprises a memory and a processor; wherein,

the memory is used to store one or more computer instructions that are executed by the processor to perform any of the method steps described above.

As shown in fig. 9, the computer system 900 includes a processing unit 901 which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the computer system 900 are also stored. The processing unit 901, the ROM902, and the RAM903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed. The processing unit 901 may be implemented as a processing unit such as CPU, GPU, TPU, FPGA, NPU.

In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data transmission method. In such an embodiment, the computer program may be downloaded and installed from the network through the communication section 909, and/or installed from the removable medium 911.

The disclosed embodiments also disclose a computer program product comprising a computer program/instructions which, when executed by a processor, implement any of the method steps described above.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.

As another aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, which may be a computer-readable storage medium included in the apparatus described in the above-described embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the inventive concept. Such as the technical solution formed by mutually replacing the above-mentioned features and the technical features with similar functions (but not limited to) disclosed in the embodiments of the present disclosure.

Claims

1. A data access method, comprising:

receiving a data acquisition request; the data acquisition request is based on the fact that when operation and maintenance personnel need to import data produced by the production platform, the data acquisition request is provided for a data import server by a client after being configured by the client, and the data acquisition request comprises configuration data of the data to be imported; the configuration data comprises a data segmentation field for segmenting the data to be imported;

determining data import parameters and data import modes according to the configuration data;

determining the number of parallel threads of a thread for data import and the sub-data quantity of data to be imported by each thread according to the data import parameters, determining whether a data source of the production platform supports the incremental import mode or not when the data import mode is the incremental data import mode, and determining the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;

Executing the threads in parallel, and acquiring data to be imported from a production platform by the threads;

the method for determining the number of parallel threads for data import and the sub-data amount of data to be imported by each thread according to the data import parameters comprises the following steps:

determining the sub-data quantity of each thread needing to import data according to the data total quantity and the parallel thread quantity;

wherein the method further comprises:

2. The method of claim 1, wherein determining the data import parameter from the configuration data comprises:

3. The method of claim 2, wherein determining the number of parallel threads for data importation and the sub-data amount of data each thread needs to importate according to the data importation parameters comprises:

4. A data access method, comprising:

the client sends a data acquisition request to a data import server, wherein the data acquisition request comprises the configuration data; the configuration data comprises a data segmentation field for segmenting the data to be imported;

The data import server determines data import parameters and data import modes according to the configuration data;

the data import server determines the number of parallel threads of the threads for importing data and the sub-data amount of the data to be imported by each thread according to the data import parameters, determines whether a data source of a production platform supports an incremental import mode or not when the data import mode is the incremental data import mode, and determines the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;

the data import server executes the threads in parallel, and the threads acquire data to be imported from a production platform;

the data importing server determines the number of parallel threads for importing data and the sub-data amount of data to be imported by each thread according to the data importing parameters, and comprises the following steps:

the data import server determines the sub-data quantity of the data to be imported by each thread according to the data total quantity and the parallel thread quantity;

the method further comprises the steps of:

5. The method of claim 4, wherein the data import server determining the data import parameters from the configuration data comprises:

6. The method of claim 5, wherein the data import server determining the number of parallel threads for data import and the sub-data amount of data each thread needs to import according to the data import parameters, comprising:

7. The method of any of claims 4-6, wherein the method further comprises:

8. The method of claim 7, wherein, when the data source supports incremental data importation, the client presents an incremental data importation configuration interface to the user comprising

9. A data access device, comprising:

A receiving module configured to receive a data acquisition request; the data acquisition request is based on the fact that when operation and maintenance personnel need to import data produced by the production platform, the data acquisition request is provided for a data import server by a client after being configured by the client, and the data acquisition request comprises configuration data of the data to be imported; the configuration data comprises a data segmentation field for segmenting the data to be imported;

the first determining module is configured to determine a data import parameter and a data import mode according to the configuration data;

the second determining module is configured to determine the number of parallel threads of the threads for data import and the sub-data amount of data to be imported for each thread according to the data import parameters, determine whether the data source of the production platform supports the incremental import mode when the data import mode is the incremental data import mode, and determine the import starting position of the data source of the data to be imported in the production platform when the data source of the production platform supports the incremental import mode; the sum of the sub-data amounts of the data to be imported by the thread is the total data amount of the data to be imported; the starting position is designated in the configuration data by an operation and maintenance personnel or is determined according to the ending position of the data imported from the data source last time;

The parallel execution module is configured to execute the number of the threads in parallel, and the threads acquire data to be imported from a production platform;

the determining, in the second determining module, the number of parallel threads performing data import according to the data import parameter and the sub-data amount of data to be imported by each thread are implemented as follows:

wherein the apparatus further comprises:

the request receiving module is configured to receive a data importing mode determining request sent by the client; the data importing mode determining request comprises information of the data source;

the mode determining module is configured to determine whether the data source supports an incremental data importing mode according to the data structure design of the data source or the type of the stored service data;

and the reply module is configured to send reply information to the client, wherein the reply information comprises an indication of whether the data source supports an incremental data import mode or not.

10. A data access system, comprising: a client and a data importing server;

the client side also sends a data importing mode determining request to the data importing server; the data importing mode determining request comprises information of the data source;

11. An electronic device includes a memory and a processor; wherein,

the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement the steps of the method of any one of claims 1-8.

12. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, implement the steps of the method of any of claims 1-8.