CN113688159B - Data extraction method and device - Google Patents

Data extraction method and device Download PDF

Info

Publication number
CN113688159B
CN113688159B CN202111050390.XA CN202111050390A CN113688159B CN 113688159 B CN113688159 B CN 113688159B CN 202111050390 A CN202111050390 A CN 202111050390A CN 113688159 B CN113688159 B CN 113688159B
Authority
CN
China
Prior art keywords
data
data extraction
task
time
starting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111050390.XA
Other languages
Chinese (zh)
Other versions
CN113688159A (en
Inventor
钞娜娜
李启坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202111050390.XA priority Critical patent/CN113688159B/en
Publication of CN113688159A publication Critical patent/CN113688159A/en
Application granted granted Critical
Publication of CN113688159B publication Critical patent/CN113688159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24566Recursive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data extraction method and a device, wherein the data extraction method comprises the following steps: setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation; performing dynamic slicing operation on the data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time; performing data scanning operation on the target data to judge whether the data extraction task meets the starting condition or not; and when the data extraction task meets the starting condition, executing data extraction operation according to a plurality of sub-time periods.

Description

Data extraction method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data extraction method and apparatus.
Background
With the development of computer technology, the development of various industries is faster and more intelligent, and particularly relates to the technical field of data extraction.
Traditional data extraction operations are usually completed manually, and the process is tedious and error-prone. At present, the common extraction mode in the prior art is offline extraction, wherein offline extraction starts to extract data after the next morning, and the data from the previous morning to the present morning is extracted. However, the offline data extraction has the problems of low extraction speed and inaccurate extraction data; such as: if the data amount is extremely large on the previous day, the time spent for starting the drawing in the early morning is extremely long; if the data extraction operation is started in the early morning, the data update exists, so that the data in the previous day of extraction is lost, and the problem of inaccurate extraction data is caused; the real-time and accuracy requirements of the existing ABS (i.e. securitization of assets) business on data are high, and the existing business can be influenced when the existing offline lottery scheme is serious.
Therefore, the data extraction method and device can meet the real-time performance and accuracy of data extraction.
Disclosure of Invention
The application provides a data extraction method and device, which are used for solving the problems of low extraction speed and inaccurate extracted data in the prior art and improving the instantaneity and accuracy of data extraction.
In a first aspect, the present application provides a data extraction method, including the steps of:
setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
performing dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
performing the data scanning operation on the target data to judge whether the data extraction task meets the starting condition;
and when the data extraction task meets the starting condition, executing the data extraction operation according to the plurality of sub-time periods. In one possible implementation manner, the dynamically slicing operation is performed on the data extraction time period according to the data distribution information, so as to obtain a plurality of sub-time periods, including:
determining the data quantity of the sub-time period according to the data distribution information, wherein the data quantity of the sub-time period is used for indicating the data quantity of the target data in each sub-time period;
and carrying out dynamic slicing operation on the data extraction time period according to the sub-time period data quantity and the preset quantity to obtain a plurality of sub-time periods.
In one possible embodiment, the method further includes, when the data extraction task does not satisfy the start condition and reaches an end time of the data extraction task, starting to perform a data extraction operation of the data extraction task at a start time of a next time of the data extraction task.
In one possible embodiment, the starting condition includes: and the data volume of the data extraction task is not smaller than a preset value.
In a possible implementation manner, the data extraction operation for performing the data extraction task includes:
acquiring data of an ith round of data extraction operation of the data extraction task at a query time point, wherein i is a natural number; comparing the data with the data of the last query time point of the ith round of data extraction operation to judge whether the data change or not:
if yes, continuing the ith round of data extraction operation of the data extraction task;
if not, ending the ith round of data extraction operation of the data extraction task, and starting to execute the (i+1) th round of data extraction operation of the data extraction task.
In a possible implementation manner, the data extraction operation for performing the data extraction task further includes:
and judging whether the data of the data extraction task is extracted, if not, returning to continue to execute the data extraction operation of the data extraction task, and if so, ending the data extraction task.
In one possible embodiment, the method further comprises: and acquiring the latest data extracted by the data extraction task in the corresponding time period according to the data updating time.
In a second aspect, the present application provides a data extraction apparatus, including:
the task establishment module is used for setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
the slicing module is used for carrying out dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
the judging module is used for carrying out the data scanning operation on the target data so as to judge whether the data extraction task meets the starting condition or not;
and the data extraction module is used for executing the data extraction operation according to the plurality of sub-time periods when the data extraction task meets the starting condition.
In a third aspect, embodiments of the present disclosure provide an electronic device. The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; and the processor is used for realizing the data extraction method when executing the program stored in the memory.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the data extraction method as described above.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the method provided by the embodiment of the application, the dividing and recursing thought is used, and the offline data is dynamically divided into a plurality of rounds of extraction, the data extraction is sequentially recursion, and the technical means that the data extraction can be started on the same day is adopted, so that the problem that the time consumed for starting the extraction in the early morning is long under the condition that the data quantity of the previous day is extremely large is solved, and the technical effect of improving the data extraction efficiency is further achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 schematically illustrates a system architecture of a data extraction method and apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a data extraction method according to an embodiment of the disclosure;
fig. 3 schematically shows a block diagram of a data extraction apparatus according to an embodiment of the present disclosure; and
fig. 4 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.
Fig. 1 schematically illustrates a system architecture suitable for use in the data extraction methods and apparatus of embodiments of the present disclosure.
Referring to fig. 1, a system architecture 100 suitable for a data extraction method and apparatus according to an embodiment of the present disclosure includes: terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. The terminal devices 101, 102, 103 may have image capturing means, picture/video playback applications, etc. installed thereon. Other communication client applications may also be installed, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices that display screens and support picture/video playback, which may further include image capturing means, such as electronic devices including, but not limited to, smartphones, tablet computers, notebook computers, desktop computers, drones, etc.
The server 105 may be a server providing various services, such as a background management server (merely an example) providing service support for data processing of images or videos captured by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the image/video processing request, and feed back the processing result (e.g., a web page, information, or data acquired or generated according to the user request) to the terminal device.
It should be noted that, the data extraction method provided by the embodiments of the present disclosure may be generally performed by the server 105 or a terminal device having a certain computing capability. Accordingly, the data extraction device provided in the embodiments of the present disclosure may be generally disposed in the server 105 or the terminal device having a certain computing capability. The method of data extraction provided by the embodiments of the present disclosure may also be performed by a server or a cluster of servers other than the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the apparatus for data extraction provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
A first exemplary embodiment of the present disclosure provides a method of data extraction. As shown in fig. 2, an embodiment of the present application provides a data extraction method, including: step S100 and step S200.
Setting data extraction task starting time, finishing time, starting conditions and the like according to the characteristics of data, starting a first round of drawing after the data extraction task meets the starting conditions, finishing the first round of drawing until the data of the data extraction task does not change, and starting a second round of drawing; and recursion is performed in sequence until all data extraction tasks are extracted. And the data extraction of the data extraction task can be used after the data extraction is completed for the next time. The characteristic of the data may be an amount of data.
Step S100, setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
step 200, performing dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
step S300, carrying out the data scanning operation on the target data to judge whether the data extraction task meets the starting condition;
and step 400, when the data extraction task meets the starting condition, executing the data extraction operation according to the plurality of sub-time periods.
It should be noted that the task of extracting the setting data may be set by the user or may be set by the system for extracting the data. The source data system is a system for generating the destination data, and the destination data is data which needs to be extracted by a system for extracting the data.
In general, the offline drawing is to extract data from yesterday early morning to the next morning; in the embodiment of the invention, the starting decimation rule is set, and the embodiment of the invention scans all the data extraction tasks which do not start to be executed through the timing task and performs the data extraction operation according to the task starting decimation rule, namely whether the starting condition is met or not.
The starting conditions include: and the data volume of the data extraction task is not smaller than a preset value. The starting condition may be set according to the specific data amount, for example, when the data amount is greater than or equal to a certain preset value, the data extraction is started. The preset value can be 200w, the preset value can be set according to actual service conditions, for example, when the data of a certain table changing on a certain day is 1000w and the data of a certain table changing on a certain day is 100w, the preset values set by the data are different, and the preset values can be used for measuring indexes according to respective service lines.
In one embodiment of the present invention, the dynamically slicing operation is performed on the data extraction period according to the data distribution information to obtain a plurality of sub-periods, including: determining the data quantity of the sub-time period according to the data distribution information, wherein the data quantity of the sub-time period is used for indicating the data quantity of the target data in each sub-time period; and carrying out dynamic slicing operation on the data extraction time period according to the sub-time period data quantity and the preset quantity to obtain a plurality of sub-time periods.
And acquiring data distribution information of the target data, and determining the data quantity of the sub-time period according to the data distribution information, wherein the sub-time period data quantity is the approximate size of the data quantity of each extracted data. And carrying out dynamic slicing operation on the data extraction time periods according to the data quantity of the sub-time periods and the preset quantity, namely, the data quantity of each time period is approximately the same, the data quantity of each time period can be different by a preset quantity, and the preset quantity is used for adjusting the data quantity in a plurality of sub-time periods.
Optionally, the dynamically slicing operation is performed on the data extraction time period according to the data distribution information to obtain a plurality of sub-time periods, including: determining the number of fragments according to the data distribution information; and carrying out dynamic slicing operation on the data extraction time period according to the slicing number to obtain a plurality of sub-time periods.
According to the embodiment of the disclosure, the dynamic slicing operation is performed on the data extraction time period to obtain a plurality of sub-time periods, and the data are extracted for a plurality of times in the plurality of sub-time periods respectively.
In one embodiment of the present invention, the performing the data extraction operation according to the plurality of sub-periods when the data extraction task satisfies the start condition includes sequentially performing the data extraction operation in the plurality of sub-periods.
In one embodiment of the present invention, when the data extraction task does not satisfy the start condition and reaches the end time of the data extraction task, the data extraction operation of the data extraction task is started to be performed at the next start time of the data extraction task. If the data amount of the data extraction task is smaller than the preset value, the data extraction operation can be automatically started after the morning of the day.
Before a data extraction task is established, firstly abutting against data source information corresponding to mysql, setting library table information corresponding to mysql data sources corresponding to the current data extraction task when the data extraction task is newly established, directly connecting mysql when the data extraction task is actually extracted offline, inquiring corresponding data through a select mode, firstly storing the corresponding data into a local file, and finally loading the file into a hive table; according to the embodiment of the invention, through the established library table information, the used mysql interface corresponds to different data sources, so that each extraction task in the embodiment of the invention corresponds to one table in the library table, and a plurality of extraction tasks correspond to a plurality of tables. The select mode is a query mode; the mysql is a database; the hive represents a data warehouse tool. In one embodiment of the present invention, a data extraction operation for performing the data extraction task includes:
acquiring data of an ith round of data extraction operation of the data extraction task at a query time point, wherein i is a natural number; comparing the data with the data of the last query time point of the ith round of data extraction operation to judge whether the data change or not:
if yes, continuing the ith round of data extraction operation of the data extraction task;
if not, ending the ith round of data extraction operation of the data extraction task, and starting to execute the (i+1) th round of data extraction operation of the data extraction task.
Specifically, when the data extraction task meets the starting condition, that is, the data amount of the data extraction task is not less than 200w, the first round of data extraction can be started, and the first round of data extraction is not ended until the data has no change; after the first round of data extraction is finished, the system automatically starts the second round of data extraction, wherein the second round of data extraction starting time is the first round of data extraction ending time, and the second round of data extraction operation is finished until the data does not change; and so on, the third and fourth rounds of data extraction … are started until all the data after the morning of the day are extracted.
According to the embodiment of the invention, whether the data of the data extraction task has variation or not can be judged by analyzing whether the data of two inquiry time points has variation or not, and if the data does not have variation, the data extraction operation of the round should be ended, and the next round of data extraction is started. When the data of one query time point is obtained by the data extraction of the present round, the next round of data extraction may obtain the data of another query time point, for example, when the data extraction of the first round obtains the data of one query time point 1:10, by determining a query time point 1:10 and last query time point 1: whether the data between 00 changes or not is judged at the query time point 1:10, if there is a fluctuation in the data of the data extraction task, then the inquiry time point 1 is performed: 10 data extraction operation; the second round of data extraction may obtain query time point 1:20 by querying time point 1:20 and last query time point 1:10 to determine whether the data between 10 has changed at time point 1:20, and if there is a fluctuation in the data of the data extraction task, performing a query at a time point 1: 20. Similarly, the third round of data extraction and the fourth round of data extraction sequentially perform query extraction operation of the data.
In one embodiment of the present invention, the data extraction operation for performing the data extraction task further includes:
and judging whether the data of the data extraction task is extracted, if not, returning to continue to execute the data extraction operation of the data extraction task, and if so, ending the data extraction task. When judging that a certain data extraction task of a plurality of data extraction tasks is not extracted, continuing the extraction operation of the data extraction task; after all the data extraction tasks are extracted, the extraction operation of the data extraction tasks is finished.
In one embodiment of the present invention, further comprising: and acquiring the latest data extracted by the data extraction task in the corresponding time period according to the data updating time. Since there may be multiple updates of one data day, the day may be divided into multiple time slices, for example 2021.04.20 is divided into multiple time slices, and in the partition of dt= 20210420, there may be multiple data, and at this time, it is necessary to window according to the data update time to obtain the latest data.
The embodiment of the invention uses the ideas of dividing and recursing, and adopts the technical means that offline data are dynamically divided into a plurality of rounds, data extraction is sequentially recursion, and the data extraction can be started on the same day, so that the problem of long time for starting the extraction in the morning under the condition that the data amount is particularly large on the previous day is solved, and the technical effect of improving the data extraction efficiency is further achieved; because the data on the same day is dynamically divided into a plurality of rounds of extraction, and the early-stage change data can be obtained, and the technical means that the data near the early morning can be rapidly extracted is adopted, the problems that the data on the previous day is lost and the extracted data is inaccurate due to the updating operation existing in the early morning are solved, the problem of missing extraction caused by data modification in the early morning is further solved, and the technical effect of greatly improving the accuracy of offline data extraction is achieved.
The embodiment of the invention extracts the task by newly creating the data; performing dynamic slicing operation on the data extraction time period according to the data distribution information to obtain a plurality of sub-time periods; performing the data scanning operation on the target data to judge whether the data extraction task meets the starting condition; and when the data extraction task meets the starting condition, executing the data extraction operation according to the plurality of sub-time periods. The data extraction method provided by the embodiment of the invention not only improves the extraction efficiency and the real-time of extraction, but also improves the accuracy of extracted data.
A second exemplary embodiment of the present disclosure provides an apparatus for data extraction. As shown in fig. 3, an embodiment of the present application provides a data extraction device, including:
the task establishment module is used for setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
the slicing module is used for carrying out dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
the judging module is used for carrying out the data scanning operation on the target data so as to judge whether the data extraction task meets the starting condition or not;
and the data extraction module is used for executing the data extraction operation according to the plurality of sub-time periods when the data extraction task meets the starting condition. The data extraction device provided by the embodiment of the invention not only improves the extraction efficiency, but also improves the accuracy of extracted data.
In the second embodiment described above, any of the task creation module and the data extraction module may be incorporated in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the task creation module and the data extraction module may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of any of the three. Alternatively, at least one of the task creation module and the data extraction module may be at least partially implemented as a computer program module which, when executed, may perform the corresponding functions.
A fourth exemplary embodiment of the present disclosure provides an electronic device. As shown in fig. 4, an electronic device 400 provided in an embodiment of the present disclosure includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete communication with each other through the communication bus 404; a memory 403 for storing a computer program; the processor 401 is configured to implement the data extraction method as described above when executing the program stored in the memory.
The fifth exemplary embodiment of the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the data extraction method as described above.
The computer-readable storage medium may be embodied in the apparatus/means described in the above embodiments; or may exist alone without being assembled into the apparatus/device. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A data extraction method, comprising the steps of:
setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
performing dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
performing the data scanning operation on the target data to judge whether the data extraction task meets the starting condition;
and when the data extraction task meets the starting condition, starting a first round of data extraction, sequentially executing a plurality of rounds of data extraction operations according to the plurality of sub-time periods, stopping the data extraction operation when the ending time of the data extraction task is reached, and starting the data extraction operation of the data extraction task at the starting time of the next round of the data extraction task.
2. The method for extracting data according to claim 1, wherein the dynamically slicing operation is performed on the data extraction period according to the data distribution information to obtain a plurality of sub-periods, including:
determining the data quantity of the sub-time period according to the data distribution information, wherein the data quantity of the sub-time period is used for indicating the data quantity of the target data in each sub-time period;
and carrying out dynamic slicing operation on the data extraction time period according to the sub-time period data quantity and the preset quantity to obtain a plurality of sub-time periods.
3. The data extraction method according to claim 1, further comprising:
and when the data extraction task does not meet the starting condition and reaches the ending time of the data extraction task, starting to execute the data extraction operation of the data extraction task at the next starting time of the data extraction task.
4. The data extraction method according to claim 1, wherein the start-up condition includes: and the data volume of the data extraction task is not smaller than a preset value.
5. The data extraction method according to claim 1, wherein the data extraction operation for performing the data extraction task includes:
acquiring data of an ith round of data extraction operation of the data extraction task at a query time point, wherein i is a natural number; comparing the data with the data of the last query time point of the ith round of data extraction operation to judge whether the data change or not:
if yes, continuing the ith round of data extraction operation of the data extraction task;
if not, ending the ith round of data extraction operation of the data extraction task, and starting to execute the (i+1) th round of data extraction operation of the data extraction task.
6. The data extraction method according to claim 1, wherein the data extraction operation of performing the data extraction task further comprises:
and judging whether the data of the data extraction task is extracted, if not, returning to continue to execute the data extraction operation of the data extraction task, and if so, ending the data extraction task.
7. The data extraction method according to claim 1, further comprising: and acquiring the latest data extracted by the data extraction task in the corresponding time period according to the data updating time.
8. A data extraction apparatus, comprising:
the task establishment module is used for setting a data extraction task, wherein the data extraction task is used for extracting target data of a source data system, and the data extraction task comprises the following steps: the source data system generates the starting time and the ending time of the target data, the starting condition of the data extraction task, the data scanning operation and the data extraction operation;
the slicing module is used for carrying out dynamic slicing operation on a data extraction time period according to data distribution information to obtain a plurality of sub-time periods, wherein the data distribution information comprises the distribution condition of the data quantity of the target data on the data extraction time period, and the data extraction time period is a time period between the starting time and the ending time;
the judging module is used for carrying out the data scanning operation on the target data so as to judge whether the data extraction task meets the starting condition or not;
and the data extraction module is used for starting the first round of data extraction when the data extraction task meets the starting condition, sequentially executing a plurality of rounds of data extraction operations according to the plurality of sub-time periods, stopping the data extraction operation when the ending time of the data extraction task is reached, and starting the data extraction operation of the data extraction task at the starting time of the next round of the data extraction task.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the data extraction method according to any one of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data extraction method of any one of claims 1-7.
CN202111050390.XA 2021-09-08 2021-09-08 Data extraction method and device Active CN113688159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111050390.XA CN113688159B (en) 2021-09-08 2021-09-08 Data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111050390.XA CN113688159B (en) 2021-09-08 2021-09-08 Data extraction method and device

Publications (2)

Publication Number Publication Date
CN113688159A CN113688159A (en) 2021-11-23
CN113688159B true CN113688159B (en) 2024-04-05

Family

ID=78585973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111050390.XA Active CN113688159B (en) 2021-09-08 2021-09-08 Data extraction method and device

Country Status (1)

Country Link
CN (1) CN113688159B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630934A (en) * 2015-12-23 2016-06-01 浪潮电子信息产业股份有限公司 Data statistic method and system
CN106126612A (en) * 2016-06-22 2016-11-16 重庆秒银科技有限公司 A kind of big ETL process dynamically divides the data pick-up method of timeslice
CN107436883A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 The method, apparatus and system of data pick-up based on complementation
CN108628889A (en) * 2017-03-21 2018-10-09 北京京东尚科信息技术有限公司 Sampling of data mthods, systems and devices based on timeslice
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN110928941A (en) * 2019-11-28 2020-03-27 杭州数梦工场科技有限公司 Data fragment extraction method and device
CN113360558A (en) * 2021-06-04 2021-09-07 北京京东振世信息技术有限公司 Data processing method, data processing device, electronic device, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11436231B2 (en) * 2020-01-13 2022-09-06 EMC IP Holding Company LLC Continuous query scheduling and splitting in a cluster-based data storage system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630934A (en) * 2015-12-23 2016-06-01 浪潮电子信息产业股份有限公司 Data statistic method and system
CN107436883A (en) * 2016-05-26 2017-12-05 北京京东尚科信息技术有限公司 The method, apparatus and system of data pick-up based on complementation
CN106126612A (en) * 2016-06-22 2016-11-16 重庆秒银科技有限公司 A kind of big ETL process dynamically divides the data pick-up method of timeslice
CN108628889A (en) * 2017-03-21 2018-10-09 北京京东尚科信息技术有限公司 Sampling of data mthods, systems and devices based on timeslice
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN110928941A (en) * 2019-11-28 2020-03-27 杭州数梦工场科技有限公司 Data fragment extraction method and device
CN113360558A (en) * 2021-06-04 2021-09-07 北京京东振世信息技术有限公司 Data processing method, data processing device, electronic device, and storage medium

Also Published As

Publication number Publication date
CN113688159A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN112527649A (en) Test case generation method and device
CN112671892B (en) Data transmission method, device, electronic equipment and medium
CN111400304A (en) Method and device for acquiring total data of section dates, electronic equipment and storage medium
CN112783887A (en) Data processing method and device based on data warehouse
CN111400301A (en) Data query method, device and equipment
CN112433757A (en) Method and device for determining interface calling relationship
CN113688159B (en) Data extraction method and device
CN110909072B (en) Data table establishment method, device and equipment
CN115827646A (en) Index configuration method and device and electronic equipment
CN113283991B (en) Processing method and device for transaction data on blockchain
CN111026629A (en) Method and device for automatically generating test script
CN107526530B (en) Data processing method and device
CN111459411B (en) Data migration method, device, equipment and storage medium
CN112711588B (en) Method and device for multi-table connection
CN113609168A (en) Data export method, device, terminal and readable storage medium
CN115373831A (en) Data processing method, device and computer readable storage medium
CN112905427B (en) Data processing method and device
CN113448652A (en) Request processing method and device
CN112988806A (en) Data processing method and device
CN111459981A (en) Query task processing method, device, server and system
CN113326890B (en) Labeling data processing method, related device and computer program product
CN113779450B (en) Page access method and page access device
CN113836405B (en) Information query method, device and computer readable storage medium
CN112667627B (en) Data processing method and device
CN113362097B (en) User determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant