CN113190555A - Data import method and device - Google Patents

Data import method and device Download PDF

Info

Publication number
CN113190555A
CN113190555A CN202110484417.XA CN202110484417A CN113190555A CN 113190555 A CN113190555 A CN 113190555A CN 202110484417 A CN202110484417 A CN 202110484417A CN 113190555 A CN113190555 A CN 113190555A
Authority
CN
China
Prior art keywords
data
import
fragment
storage
importing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110484417.XA
Other languages
Chinese (zh)
Inventor
冯志恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110484417.XA priority Critical patent/CN113190555A/en
Publication of CN113190555A publication Critical patent/CN113190555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data import method and a data import device, and the method comprises the following steps: responding to a data import instruction, dividing data to be imported into a plurality of data fragments, constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating the corresponding relation between the data fragment and the storage fragment, importing each data fragment into the storage fragment corresponding to the data fragment, by constructing an import relationship between a data segment and a memory segment, so as to import the data segment into the corresponding memory segment based on the import relationship, avoids the problem that different data to be imported jointly contend for the same storage node when the data to be imported is imported based on a random import strategy in the related technology, the technical problem that the import rate is low is solved, and the targeted import can be realized by executing the import operation based on the import relation, so that the technical effect of the import rate is improved.

Description

Data import method and device
Technical Field
The embodiment of the application relates to the technical field of computers and the field of big data, in particular to a data import method and device.
Background
The Spark processor is a fast and general computing engine designed for large-scale data processing, and can read data in a distributed manner, and import the processed data into a storage target in a distributed manner by performing various conversions, processing and the like on the data (such as a database management system, which can be specifically a clickwouse, which is a column-type database management system for real-time analysis of large data).
In the prior art, a data import method includes: the method comprises the steps of obtaining a data import request, wherein the data import request comprises one or more data to be imported, randomly distributing storage fragments for the one or more data to be imported in each storage fragment of a database management system, and importing the one or more data to be imported into the corresponding storage fragments.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art: the data to be imported is imported into the database management system by a random import method, and the technical problem of low import rate caused by the fact that different data to be imported jointly compete for the same storage node may exist.
Disclosure of Invention
The embodiment of the application provides a data import method and device, which are used for solving the problem of low import efficiency of data.
In a first aspect, an embodiment of the present application provides a data importing method, including:
responding to a data import instruction, dividing data to be imported into a plurality of data fragments, and constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating a corresponding relation between the data fragments and the storage fragments;
and each data fragment is imported into a storage fragment corresponding to the data fragment.
It should be noted that, in this embodiment, an import relationship between a data partition and a memory partition is constructed, and the import relationship is used to indicate a corresponding relationship between the data partition and the memory partition, so that the data partition is imported into the corresponding memory partition based on the import relationship, thereby avoiding a technical problem that an import rate is low due to different data to be imported contending for the same storage node together when the data to be imported is imported based on a random import policy in the related art.
In some embodiments, constructing an import relationship from each of the data slices and each of the memory slices of the columnar database management system includes:
generating a list including address information of each memory slice, and broadcasting the list in a plurality of import threads for executing import operation;
and constructing the import relation according to each import thread and a modular processing result between the list and each data fragment.
It should be noted that, in this embodiment, by determining the list and constructing the import relationship based on the modulo result between the list and the data shards, each data shard can be relatively evenly distributed in each storage shard, so that the load of each storage shard is relatively balanced, the performance of data import can be improved, and the technical effects of protecting and utilizing the resources of the columnar database management system can be improved.
In some embodiments, constructing the import relationship according to the result of the modulo processing between the list and each of the data fragments by each of the import threads includes:
allocating an index value to each data fragment, and allocating a data fragment including an index value to each import thread;
and constructing the import relation for the data fragments obtained by distribution and the modulus result between the lists according to each import thread.
It should be noted that, in this embodiment, by allocating an index value and allocating data fragments including the index value to an import thread, so as to construct an import relationship based on a modulo structure between the data fragments and lists obtained by the import thread through allocation, efficiency of constructing the import relationship can be improved, and each data fragment can be evenly allocated in each storage fragment, thereby improving a technical effect of resource utilization.
In some embodiments, constructing the import relationship according to each import thread and a modulo result between the allocated data slice and the list includes:
and determining the total number of the storage fragments in the list through each import thread, and constructing the import relation according to the modulus result between the index value of the distributed data fragments and the total number of the storage fragments of each import thread.
It should be noted that, in this embodiment, each import thread performs a modulo process to obtain a modulo result, and since the import threads are independent of each other and are executed in parallel, the efficiency of determining the modulo result can be improved, and the technical effect of the efficiency of constructing the import relationship can be improved.
In some embodiments, importing each of the data fragments into a storage fragment having a corresponding relationship with the data fragment includes:
and parallelly importing the data fragments distributed by each import thread into the storage fragments with the corresponding relation through each import thread.
It should be noted that, in this embodiment, by executing the import operation in parallel based on a plurality of import threads, the technical effect of improving the efficiency of importing each data slice can be achieved.
In some embodiments, if any data fragment fails to be imported, the method further includes:
and according to a preset time interval, re-importing the arbitrary data fragment into the storage fragment with the corresponding relation.
It should be noted that, in this embodiment, for the case that any data fragment fails to be imported, the data importing apparatus may re-import after a time interval and still import based on an import relationship, so as to avoid the technical problem of resource waste caused by complete failure of the previous operation of re-running (i.e. storing fragments from new competition) in the related art, improve the resource utilization rate, improve the import efficiency, and avoid the problem of data loss caused by re-running, i.e. also improve the technical effects of accuracy and reliability of data import.
In some embodiments, if any data fragment fails to be imported, the method further includes:
and determining the storage fragment corresponding to the any data fragment, determining a copy of the storage fragment corresponding to the any data fragment, and importing the any data fragment into the copy.
It should be noted that, in this embodiment, by importing any data fragment that fails to be imported into a corresponding copy, the technical effects of high performance and high fault tolerance can be achieved, and the technical effect of flexibility of data import can be improved.
In some embodiments, the plurality of memory slices are: and determining based on the number of the acquired data import tasks and/or the load of the column type database management system, wherein the data import tasks comprise the data to be imported.
It should be noted that, in this embodiment, the memory shards are determined according to the number and/or load of the data import tasks, so that resources of the columnar database management system can be prevented from being excessively squeezed, load balance of the columnar database management system can be improved as much as possible, and a technical effect of reasonable utilization of the resources can be improved.
In some embodiments, the number of data import tasks is less than a threshold of data import tasks of the plurality of memory slices; and/or the load of the plurality of memory slices is less than a preset load threshold.
In a second aspect, an embodiment of the present application provides a data importing apparatus, including:
the data importing device comprises a segmenting unit, a processing unit and a processing unit, wherein the segmenting unit is used for responding to a data importing instruction and segmenting data to be imported into a plurality of data fragments;
the construction unit is used for constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating the corresponding relation between the data fragment and the storage fragment;
and the first import unit imports each data fragment into a storage fragment corresponding to the data fragment.
In some embodiments, the building unit comprises:
a generating subunit, configured to generate a list including address information of each memory slice;
a broadcast subunit operable to broadcast the list in a plurality of import threads for performing an import operation;
and the construction subunit is configured to construct the import relationship according to each import thread and a modulo processing result between the list and each data fragment.
In some embodiments, the building subunit comprises:
the distribution module is used for distributing an index value to each data fragment and distributing the data fragment comprising the index value to each import thread;
and the construction module is used for constructing the import relation according to each import thread and the modulus result between the distributed data fragments and the lists.
In some embodiments, the building module is configured to determine, by each of the import threads, a total number of the storage segments in the list, and build the import relationship according to a modulo result of an index value of the data segment obtained by the allocation and the total number of the storage segments according to each of the import threads.
In some embodiments, the first import unit is configured to, through each import thread, import, in parallel, the data fragment allocated by each import thread into the memory fragment having a corresponding relationship.
In some embodiments, the apparatus further comprises:
and the second import unit is used for importing any data fragment into the storage fragment with the corresponding relation again according to a preset time interval if the import of the data fragment fails.
In some embodiments, the apparatus further comprises:
the determining unit is used for determining a storage fragment corresponding to any data fragment if the data fragment is failed to be imported, and determining a copy of the storage fragment corresponding to the data fragment;
and the third import unit is used for importing the arbitrary data fragments into the copy.
In some embodiments, the plurality of memory slices are: and determining based on the number of the acquired data import tasks and/or the load of the column type database management system, wherein the data import tasks comprise the data to be imported.
In some embodiments, the number of data import tasks is less than a threshold of data import tasks of the plurality of memory slices; and/or the load of the plurality of memory slices is less than a preset load threshold.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions, which when executed by a processor, are configured to implement the method according to the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
In a sixth aspect, an embodiment of the present application provides a data importing system, including: a columnar database management system, and a data importing apparatus according to the second aspect.
The data importing method and device provided by the embodiment of the application comprise the following steps: responding to a data import instruction, dividing data to be imported into a plurality of data fragments, constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating the corresponding relation between the data fragment and the storage fragment, importing each data fragment into the storage fragment corresponding to the data fragment, by constructing an import relationship between a data segment and a memory segment, so as to import the data segment into the corresponding memory segment based on the import relationship, avoids the problem that different data to be imported jointly contend for the same storage node when the data to be imported is imported based on a random import strategy in the related technology, the technical problem that the import rate is low is solved, and the targeted import can be realized by executing the import operation based on the import relation, so that the technical effect of the import rate is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic flowchart of a data importing method according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of a data importing method according to another embodiment of the present application;
FIG. 3 is a flowchart illustrating a data importing method according to another embodiment of the present application;
FIG. 4 is a block diagram of a data import apparatus according to an embodiment of the present application;
FIG. 5 is a block diagram of a data import apparatus according to an embodiment of the present application;
FIG. 6 is a block diagram of a data import apparatus according to an embodiment of the present application;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms referred to in the embodiments of the present application are explained as follows:
data fragmentation: for example, in this embodiment, a data fragment is a component unit that constitutes data to be imported, and one data to be imported may include multiple data fragments.
Storing the fragments: for example, in this embodiment, the memory slices are constituent units that constitute a columnar database management system, and if the columnar database management system is a storage device that includes multiple machines, one memory slice may be a single machine, or one machine may also include multiple memory slices.
Importing a thread: refers to the smallest unit that the operating system can perform operation scheduling.
And (3) taking a mold: also known as a remainder operation.
With the development of internet technology, especially the rapid progress of e-commerce technology, the increase proportion of data volume is obviously increased, and how to improve the data import efficiency and the like becomes a problem to be solved urgently.
For example, the application scenario of the data import method may be: the data importing device can import the data in the offline data bin into the data importing column type database management system; alternatively, the data importing device may import data in a Distributed File System (HDFS) to the data import columnar database management System.
It should be noted that the application scenarios of the two data import methods are only used for exemplary illustration, and the data import method may be applied to the application scenarios, but is not to be construed as a limitation to the application scenarios.
In the related art, a computing engine (e.g., Spark) processor may be used to import the data in the offline data bin into the columnar database management system, and in particular, a random import policy is usually used to implement the method, such as randomly selecting a memory slice from the columnar database management system and writing the data to be imported into the randomly selected memory slice.
However, with the random import strategy, there may be a technical problem of low import rate due to the contention of different data to be imported for the same storage node.
In order to avoid the above technical problems, the inventors of the present application have made creative efforts to obtain the inventive concept of the present application: and constructing an import relation between the data fragments and the memory fragments so as to import the data fragments into the memory fragments having an import relation with the data fragments based on the import relation.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a data importing method according to an embodiment of the present application.
As shown in fig. 1, the method includes:
s101: and responding to the data import instruction, and dividing the data to be imported into a plurality of data fragments.
For example, the execution subject of this embodiment is a data importing device, and the data importing device may be a server (e.g., a cloud server or a local server), a terminal device, a processor (e.g., a spare processor), a chip, and the like, which is not limited in this embodiment.
The data import instruction is used for indicating that: and the data importing device executes an instruction of importing operation on the data to be imported.
In this embodiment, the source of the data import instruction is not limited.
In one example, the data import instruction may be user-triggered to the import device. For example, a worker initiates a data import instruction to the data import apparatus.
In another example, the data import instruction may be automatically triggered by the import device based on a preset trigger condition. For example, when the triggering condition is that the amount of data reaches a preset threshold, the importing device automatically triggers a data importing instruction and executes subsequent data importing operations.
In yet another example, the data import instruction may be initiated by the other apparatus to the data import apparatus. For example, the scheduling means issues a data import instruction to the data import means based on load information of the columnar database management system.
It should be understood that the above examples are for illustrative purposes only, and that the possible sources of the data import instruction are not to be construed as limiting the data import instruction.
In this embodiment, the partitioning policy for the data to be imported is not limited. For example, a splitting parameter may be preset, and the data to be imported is split based on the splitting parameter to obtain a plurality of data slices.
S102: and constructing an import relation according to each data fragment and each storage fragment of the columnar database management system.
The import relation is used for indicating the corresponding relation between the data fragment and the memory fragment.
The columnar database management system comprises a plurality of memory fragments, and one memory fragment can be understood as one storage node. Alternatively, a columnar database management system may be understood as a storage cluster comprising a plurality of machines, one storage slice being one machine, although a plurality of slices may be provided in one machine.
S103: and each data fragment is imported into a storage fragment which has an import relation with the data fragment.
In combination with the above analysis, in this embodiment, the correspondence between the data fragments and the storage fragments is constructed, and therefore, for any data fragment, the data importing apparatus may import the any data fragment into a storage node having a correspondence with the any data fragment, so as to implement targeted import.
Based on the above analysis, in the present embodiment, the following are introduced: the method comprises the steps of constructing the characteristic of an import relation between a data fragment and a storage fragment, wherein the import relation is used for indicating the corresponding relation between the data fragment and the storage fragment so as to import the data fragment into the corresponding storage fragment based on the import relation, and avoiding the technical problem of low import rate caused by the fact that different data to be imported commonly contend for the same storage node when the data to be imported is imported based on a random import strategy in the related technology.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a data importing method according to another embodiment of the present application.
As shown in fig. 2, the method includes:
s201: and responding to the data import instruction, and dividing the data to be imported into a plurality of data fragments.
For example, the description about S201 may refer to S101, and is not described herein again.
S202: a list including address information of each memory slice is generated, and the list is broadcasted in a plurality of import threads for performing an import operation.
For example, the data importing apparatus may determine each fragment included in the columnar database management system based on a connection attribute (e.g., a connection relationship between the fragments) between the fragments (based on the analysis, a machine may be understood as a node, or a fragment, or a node including multiple fragments, which is exemplarily illustrated in this embodiment by taking a fragment as an example) in the columnar database management system, and store an Internet Protocol (IP) Address of each fragment as a list (host).
The data importing device may be a spark processing device, and when the data importing device is the spark processing device, the importing thread may be understood as an executor (executor) in the spark processing device.
The data importing device comprises a plurality of importing threads, and the importing threads are used for importing the data fragments into the memory fragments corresponding to the data fragments.
S203: and according to each import thread, performing modular processing on the list and each data fragment, and constructing an import relation according to the modular processing result.
It should be noted that, in this embodiment, by determining the list and constructing the import relationship based on the modulo result between the list and the data shards, each data shard can be relatively evenly distributed in each storage shard, so that the load of each storage shard is relatively balanced, the performance of data import can be improved, and the technical effects of protecting and utilizing the resources of the columnar database management system can be improved.
In some embodiments, S203 may include the steps of:
the first step is as follows: an index value is assigned to each data slice, and a data slice including the index value is assigned to each import thread.
In some embodiments, the data import apparatus may assign an index value to each data slice based on the total number of data slices.
For example, the data importing apparatus determines the total number of each data fragment, and assigns an index value to each data fragment according to the total number of each data fragment.
Illustratively, the data importing apparatus determines the total number of each data fragment, and obtains that the total number of each data fragment is 18, that is, there are 18 data fragments in total.
The data importing apparatus allocates an index value to each data fragment according to the total number of the data fragments, and the index value may represent a sorting value of the data fragment, such as an index value 1 of a first data fragment, and so on, which is not listed here one by one.
The second step is as follows: and constructing an import relation for the data fragments obtained by distribution and the modulus result between the lists according to each import thread.
It should be noted that, in this embodiment, by allocating an index value and allocating data fragments including the index value to an import thread, so as to construct an import relationship based on a modulo structure between the data fragments and lists obtained by the import thread through allocation, efficiency of constructing the import relationship can be improved, and each data fragment can be evenly allocated in each storage fragment, thereby improving a technical effect of resource utilization.
In some embodiments, the second step may comprise the sub-steps of:
the first substep: and determining the total quantity of the memory fragments in the list through each import thread.
For example, the total number of the memory fragments is determined by any thread, and the total number of the memory fragments is 10, that is, 10 memory fragments are obtained.
The second substep: and constructing an import relation according to each import thread and a modulo result between the index value of the distributed data fragments and the total number of the storage fragments.
For example, for a data slice with an index value of 18, the modulo processing result of the data import apparatus is 8, and accordingly, the data slice with the index value of 18 is allocated to the 8 th memory slice, that is, there is a correspondence between the data slice with the index value of 18 and the 8 th memory slice.
It should be noted that, in this embodiment, each import thread performs a modulo process to obtain a modulo result, and since the import threads are independent of each other and are executed in parallel, the efficiency of determining the modulo result can be improved, and the technical effect of the efficiency of constructing the import relationship can be improved.
It should be noted that, based on S202 and S203, in this embodiment, the import relationship is constructed by a modulo processing, and in other embodiments, the import relationship may also be constructed based on a bucket partitioning policy.
In other embodiments, the data importing apparatus may also directly determine a modulus result, construct an import relationship according to the modulus result, and allocate a data slice to each import thread, so that each import thread imports the allocated data slice into a storage slice having a corresponding relationship.
Based on the two different strategies for constructing the import relationship, in some embodiments, the data import apparatus may construct the import relationship based on the modulo processing results of the multiple import threads, or may directly determine the modulo processing results and construct the import relationship based on the modulo processing results.
In contrast, for a scenario in which threads are independently imported, it may be preferable to use the data importing apparatus to directly construct an import relationship based on the modulo processing result, and for a scenario in which a plurality of threads are imported, it may be preferable to construct an import relationship based on the modulo processing result of each thread.
S204: and each data fragment is imported into a storage fragment corresponding to the data fragment.
For example, the description about S204 may refer to S103, which is not described herein.
As can be seen from the foregoing embodiments, the data importing apparatus may allocate each data fragment to a corresponding import thread, and therefore, in some embodiments, S204 may include: and through each import thread, parallelly importing the data fragments distributed by each import thread into the storage fragments with the corresponding relation.
It should be noted that, in this embodiment, by executing the import operation in parallel based on a plurality of import threads, the technical effect of improving the efficiency of importing each data slice can be achieved.
In some embodiments, in each data slice, there may be a case where one or more data slices fail to be imported, and for a case where any data slice fails to be imported, in an example, the data importing method of this embodiment may further include: and according to a preset time interval, re-importing any data fragment into the storage fragment with the corresponding relation.
That is to say, aiming at the condition that any data fragment fails to be imported, the data importing device can be used for importing again after a time interval, and still import based on the import relationship, so that the technical problem of resource waste caused by complete failure of the early operation of rerunning (namely, storing fragments from new competition) in the related art can be solved, the resource utilization rate is improved, the import efficiency is improved, the problem of data loss caused by rerunning is solved, and the technical effects of the accuracy and the reliability of data import are also improved.
In another example, the data importing method of an embodiment may further include: and determining the storage fragment corresponding to any data fragment, determining a copy of the storage fragment corresponding to any data fragment, and importing any data fragment into the copy.
In this embodiment, any data fragment failing to be imported is imported into a corresponding copy, so that the technical effects of high performance and high fault tolerance can be achieved, and the technical effect of flexibility of data import can be improved.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a data importing method according to another embodiment of the present application.
As shown in fig. 3, the method includes:
s301: the scheduling processor receives data import tasks sent by users and distributes a task queue for each user.
The number of users may be one or more. Similarly, the number of the data import tasks may be one or multiple, and the number of the data to be imported in the data import tasks may be one or multiple.
In some embodiments, the scheduling processor may set a task threshold for data import tasks in the task queue. For example, the scheduling processor sets the task threshold to 5, i.e., 5 data import tasks may be included in the task queue for a pair.
S302: the scheduling processor determines a number threshold of data import tasks of each memory slice of the columnar database management system; and/or determining a load of the columnar database management system.
The threshold of the number of data import tasks per memory slice may be understood as a maximum value of the number of data import tasks that can be stored per memory slice, that is, a maximum value of the capacity of data that can be stored per memory slice.
S303: if the scheduling processor imports the number threshold of tasks according to the data of each memory fragment; and/or determining the load of the columnar database management system, and initiating a data import instruction to the Spark processor if the columnar database management system is determined to meet the import requirement of the data import task.
For example, the scheduling processor may determine a load of the columnar database management system according to CPU resources, memory resources, and the like of the processor of the columnar database management system, and when the load reaches a preset load threshold (it is determined that the columnar database management system does not meet an import requirement of a data import task), the scheduling processor does not execute the scheduling task, that is, does not initiate a data import instruction to the Spark processor; otherwise, if the load does not reach the load threshold (it is determined that the columnar database management system meets the import requirement of the data import task), a data import instruction is initiated to the spare processor, and the scheduling processor may select a relatively low load from the storage slices (the threshold may be set by the scheduling processor based on the requirement, history, experiment, and the like, and when the load is smaller than the threshold, it is determined that the load is relatively low), and carry the selected storage slice in the import instruction, so that the spare processor imports the data to be imported into the storage slice selected by the scheduling processor.
In some embodiments, if the scheduling processor determines that the columnar database management system does not satisfy the import requirement of the data import task, the scheduled operation may be performed again after a preset time period.
If the number of the users is multiple, the scheduling processor may execute the corresponding scheduling operation according to the preset priority.
For example, the scheduling processor may prioritize execution of higher priority user-initiated data import tasks where the resources of the columnar database management system are limited.
It should be noted that, in this embodiment, the scheduling processor dynamically decides whether to execute the data importing operation, so as to prevent the resources of the columnar database management system from being excessively compressed, improve the load balance of the columnar database management system as much as possible, and improve the technical effect of reasonable utilization of the resources.
S304: the Spark processor divides the data to be imported into a plurality of data fragments, and constructs import relations according to the data fragments and the storage fragments of the columnar database management system.
The import relation is used for indicating the corresponding relation between the data fragment and the memory fragment.
S305: and the Spark processor leads each data fragment into a storage fragment corresponding to the data fragment.
For example, regarding the principle of S304 to S305, the principle described in S101 to S103 may be referred to, and the principle described in S201 to S204 may also be referred to, which is not described herein again.
In some embodiments, after importing the data to be imported into the columnar database management system, import operation information including time information (including scheduling time information and waiting scheduling time information generated by the scheduling device, and also including import time information generated by a spark processor, and the like) and data size information may be generated, and a corresponding load graph may also be generated to improve performance of subsequent data import and avoid waste of resources.
In some embodiments, the scheduling processor and the spare processor may be deployed in the same server or in different servers, which is not limited in this embodiment.
Referring to fig. 4, fig. 4 is a diagram illustrating a data importing device according to an embodiment of the present application.
As shown in fig. 4, the data importing apparatus 400 includes:
a dividing unit 401, configured to, in response to a data import instruction, divide data to be imported into a plurality of data slices.
A constructing unit 402, configured to construct an import relationship according to each data slice and each storage slice of the columnar database management system, where the import relationship is used to indicate a corresponding relationship between the data slice and the storage slice.
The first import unit 403 imports each data segment into a storage segment having a corresponding relationship with the data segment.
Referring to fig. 5, fig. 5 is a diagram illustrating a data importing device according to an embodiment of the present application.
As shown in fig. 5, the data importing apparatus 500 includes:
a dividing unit 501, configured to, in response to a data import instruction, divide data to be imported into a plurality of data slices.
In some embodiments, the plurality of memory slices are: and determining based on the number of the acquired data import tasks and/or the load of the column type database management system, wherein the data import tasks comprise the data to be imported.
In some embodiments, the number of data import tasks is less than a threshold of data import tasks of the plurality of memory slices; and/or the load of the plurality of memory slices is less than a preset load threshold.
A constructing unit 502, configured to construct an import relationship according to each data fragment and each storage fragment of the columnar database management system, where the import relationship is used to indicate a corresponding relationship between the data fragment and the storage fragment.
As can be seen in conjunction with fig. 5, in some embodiments, the building unit 502 includes:
a generating subunit 5021, configured to generate a list including address information of each memory slice.
A broadcast subunit 5022, configured to broadcast the list in a plurality of import threads for performing import operations.
A constructing subunit 5023, configured to construct the import relationship according to each import thread and a modulo processing result between the list and each data slice.
In some embodiments, building subunit 5023 comprises:
and the distribution module is used for distributing an index value to each data fragment and distributing the data fragment comprising the index value to each import thread.
And the construction module is used for constructing the import relation according to each import thread and the modulus result between the distributed data fragments and the lists.
In some embodiments, the building module is configured to determine, by each of the import threads, a total number of the storage segments in the list, and build the import relationship according to a modulo result of an index value of the data segment obtained by the allocation and the total number of the storage segments according to each of the import threads.
The first import unit 503 imports each data fragment into a storage fragment having a corresponding relationship with the data fragment.
In some embodiments, the first import unit 503 is configured to import, through each import thread, the data fragment allocated by each import thread into the memory fragment having a corresponding relationship in parallel.
A second importing unit 504, configured to, if any data fragment fails to be imported, re-import the any data fragment into a storage fragment having a corresponding relationship according to a preset time interval.
Referring to fig. 6, fig. 6 is a diagram illustrating a data importing apparatus according to an embodiment of the present application.
As shown in fig. 6, the data importing apparatus 600 includes:
a dividing unit 601, configured to divide the data to be imported into a plurality of data slices in response to the data import instruction.
A constructing unit 602, configured to construct an import relationship according to each data fragment and each storage fragment of the columnar database management system, where the import relationship is used to indicate a corresponding relationship between a data fragment and a storage fragment.
The first import unit 603 imports each data slice into a storage slice having a corresponding relationship with the data slice.
A determining unit 604, configured to determine, if any data fragment fails to be imported, a storage fragment having a correspondence with the any data fragment, and determine a copy of the storage fragment having a correspondence with the any data fragment.
A third importing unit 605, configured to import the arbitrary data slice into the copy.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.
Fig. 7 is a block diagram of an electronic device according to the data importing method according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.
The memory 702 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the data import method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the data import method provided herein.
The memory 702, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the data import method in the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing, i.e., implements the data import method in the above-described method embodiment, by executing the non-transitory software programs, instructions, and modules stored in the memory 702.
The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the data import method, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, and such remote memory may be connected to the electronic device of the data import method through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the data import method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the data import method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to another aspect of the embodiments of the present application, there is also provided a data importing system, including: a columnar database management system and a data importing device as described in any one of the above embodiments.
In some embodiments, the data importing device may be a Spark processor.
In some embodiments, the system may further comprise:
a scheduling processor to: the scheduling processor receives a data import task sent by a user and distributes a task queue for each user; and is also used for: determining a number threshold of data import tasks of each memory slice of the columnar database management system; and/or determining the load of the columnar database management system; and is also used for: if the scheduling processor imports the number threshold of tasks according to the data of each memory fragment; and/or determining the load of the columnar database management system, and initiating a data import instruction to the Spark processor if the columnar database management system is determined to meet the import requirement of the data import task.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. A data import method, comprising:
responding to a data import instruction, dividing data to be imported into a plurality of data fragments, and constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating a corresponding relation between the data fragments and the storage fragments;
and each data fragment is imported into a storage fragment corresponding to the data fragment.
2. The method of claim 1, wherein building an import relationship from each of the data shards and each memory shard of a columnar database management system comprises:
generating a list including address information of each memory slice, and broadcasting the list in a plurality of import threads for executing import operation;
and constructing the import relation according to each import thread and a modular processing result between the list and each data fragment.
3. The method of claim 2, wherein constructing the import relationship according to the result of the modulo processing between the list and each of the data slices by each of the import threads comprises:
allocating an index value to each data fragment, and allocating a data fragment including an index value to each import thread;
and constructing the import relation for the data fragments obtained by distribution and the modulus result between the lists according to each import thread.
4. The method of claim 3, wherein constructing the import relationship according to the result of taking a modulus between the allocated data slice and the list by each import thread comprises:
and determining the total number of the storage fragments in the list through each import thread, and constructing the import relation according to the modulus result between the index value of the distributed data fragments and the total number of the storage fragments of each import thread.
5. The method of claim 3, wherein importing each data slice into a memory slice having a corresponding relationship with the data slice comprises:
and parallelly importing the data fragments distributed by each import thread into the storage fragments with the corresponding relation through each import thread.
6. The method according to any one of claims 1 to 5, if any data slice import fails, the method further comprising:
and according to a preset time interval, re-importing the arbitrary data fragment into the storage fragment with the corresponding relation.
7. The method according to any one of claims 1 to 5, if any data slice import fails, the method further comprising:
and determining the storage fragment corresponding to the any data fragment, determining a copy of the storage fragment corresponding to the any data fragment, and importing the any data fragment into the copy.
8. The method of any of claims 1 to 5, wherein each of the memory slices is: and determining based on the number of the acquired data import tasks and/or the load of the column type database management system, wherein the data import tasks comprise the data to be imported.
9. The method of claim 8, wherein the number of data import tasks is less than a threshold number of data import tasks for the plurality of memory slices; and/or the load of the plurality of memory slices is less than a preset load threshold.
10. A data import apparatus, comprising:
the data importing device comprises a segmenting unit, a processing unit and a processing unit, wherein the segmenting unit is used for responding to a data importing instruction and segmenting data to be imported into a plurality of data fragments;
the construction unit is used for constructing an import relation according to each data fragment and each storage fragment of the columnar database management system, wherein the import relation is used for indicating the corresponding relation between the data fragment and the storage fragment;
and the importing unit is used for importing each data fragment into the storage fragment corresponding to the data fragment.
11. An electronic device, comprising: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured for performing the method of any one of claims 1 to 9.
12. A computer readable storage medium having stored therein computer executable instructions for implementing the method of any one of claims 1 to 9 when executed by a processor.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.
14. A data import system, comprising: a columnar database management system, and the data importing apparatus according to claim 10.
CN202110484417.XA 2021-04-30 2021-04-30 Data import method and device Pending CN113190555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110484417.XA CN113190555A (en) 2021-04-30 2021-04-30 Data import method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110484417.XA CN113190555A (en) 2021-04-30 2021-04-30 Data import method and device

Publications (1)

Publication Number Publication Date
CN113190555A true CN113190555A (en) 2021-07-30

Family

ID=76983700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110484417.XA Pending CN113190555A (en) 2021-04-30 2021-04-30 Data import method and device

Country Status (1)

Country Link
CN (1) CN113190555A (en)

Similar Documents

Publication Publication Date Title
EP3896569A1 (en) Method and apparatus for allocating server resource, electronic device and storage medium
CN112486648A (en) Task scheduling method, device, system, electronic equipment and storage medium
US20160306680A1 (en) Thread creation method, service request processing method, and related device
JP7214786B2 (en) Scheduling method, device, device and medium for deep learning inference engine
JP7170768B2 (en) Development machine operation task processing method, electronic device, computer readable storage medium and computer program
CN111259205B (en) Graph database traversal method, device, equipment and storage medium
CN111506401B (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
CN111694646A (en) Resource scheduling method and device, electronic equipment and computer readable storage medium
CN111913670B (en) Processing method and device for load balancing, electronic equipment and storage medium
US10664278B2 (en) Method and apparatus for hardware acceleration in heterogeneous distributed computing
CN111158909B (en) Cluster resource allocation processing method, device, equipment and storage medium
CN114356547B (en) Low-priority blocking method and device based on processor virtualization environment
CN112905342A (en) Resource scheduling method, device, equipment and computer readable storage medium
CN110688229B (en) Task processing method and device
CN115039091A (en) Multi-key-value command processing method and device, electronic equipment and storage medium
CN112527451B (en) Method, device, equipment and storage medium for managing container resource pool
US9672073B2 (en) Non-periodic check-pointing for fine granular retry of work in a distributed computing environment
CN111176838B (en) Method and device for distributing embedded vector to node in bipartite graph
CN116157778A (en) System and method for hybrid centralized and distributed scheduling on shared physical hosts
CN113190555A (en) Data import method and device
CN111290744A (en) Stream computing job processing method, stream computing system and electronic device
CN109478151B (en) Network accessible data volume modification
US9176910B2 (en) Sending a next request to a resource before a completion interrupt for a previous request
JP2011215812A (en) Virtual computer management method, computer system, and resource management program
CN113760968A (en) Data query method, device, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination