CN110633280A

CN110633280A - Batch data acquisition method and device, readable storage medium and computing equipment

Info

Publication number: CN110633280A
Application number: CN201910860041.0A
Authority: CN
Inventors: 张斌; 蔡云山; 陈志辉; 杨秋亮; 龚平
Original assignee: Data Co Ltd Of Beijing Asiainfo
Current assignee: Data Co Ltd Of Beijing Asiainfo
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2019-12-31

Abstract

The embodiment of the invention provides a batch data acquisition method, a batch data acquisition device, a readable storage medium and computing equipment, which are used for automatically acquiring and processing a plurality of data sources in batches and solving the problem of low efficiency of a manual acquisition mode, and the method comprises the following steps: acquiring information of data tables of a plurality of first databases; determining a data table of a second database for entering data tables of the plurality of first databases; determining an acquisition strategy for acquiring a data table of a first database by an agent cluster; and instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition strategy.

Description

Batch data acquisition method and device, readable storage medium and computing equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a batch data acquisition method and apparatus, a readable storage medium, and a computing device.

Background

With the increasing of medical data service types and the increasing of data, the method has certain epoch significance in data mining and data analysis of a large amount of medical related data.

Medical data acquisition is a prerequisite for data mining and analysis, a large number of acquisition tasks are often required to be established when medical data are acquired by butting various medical institutions, the acquisition tasks are mainly configured individually by manpower at present, labor cost is high, and efficiency is low.

Disclosure of Invention

To this end, the present disclosure provides a batch data collection method, apparatus, readable storage medium and computing device in an effort to solve or at least mitigate at least one of the problems identified above.

According to an aspect of an embodiment of the present disclosure, there is provided a batch data acquisition method, including:

acquiring information of data tables of a plurality of first databases;

determining a data table of a second database for entering data tables of the plurality of first databases;

determining an acquisition strategy for acquiring a data table of a first database by an agent cluster;

and instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition strategy.

Optionally, instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the collection policy, further includes:

determining mapping rules of the data tables of the plurality of first databases and the data table of the second database according to the information of the data tables of the plurality of first databases and the information of the data table of the second database;

instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to an acquisition strategy, wherein the method comprises the following steps:

instructing the agent cluster to collect data tables of a plurality of first databases according to a collection strategy;

and writing the data tables of the plurality of first databases into the data table of the second database according to the data tables of the plurality of first databases and the mapping rule.

Optionally, the method further comprises:

checking the data table of the second database according to the data tables of the plurality of first databases;

when the data table of the second database is determined to have errors, repairing the data table of the second database;

and updating the mapping rule according to the error and the repairing mode existing in the data table of the second database.

Optionally, the collection strategy comprises:

triggering time, execution period and validity period of each acquisition task;

the trigger time is used for indicating the trigger time point of each acquisition task in each execution cycle;

the execution period is used for indicating the period of executing the acquisition task by the agent cluster;

and a validity period indicating a time period for executing the collection task according to the execution cycle.

Optionally, the collecting strategy further comprises:

the redo interval, the longest running time, the times of failed redo and the selection of whether the execution is covered or not of each acquisition task;

the redo interval is used for indicating the time interval of restarting the collection task when the agent cluster fails to execute any collection task;

the maximum running time length is used for indicating the maximum running time length of any acquisition task, and when the execution time length of any acquisition task is longer than the maximum running time length, any acquisition task is determined to fail and any acquisition task is terminated;

the times of the failed redoing are used for indicating the times of re-executing the task when any acquisition task fails;

and the selection of whether to execute the data is covered or not is used for indicating whether the newly acquired data covers the acquired data or not after any acquisition task fails and the task is executed again.

Optionally, the collecting strategy further comprises:

the starting type and priority of each acquisition task;

the starting type is used for indicating the starting sequence of a plurality of acquisition tasks with the same priority;

and the priority is used for indicating the agent cluster to execute the acquisition tasks according to the high-low sequence of the priority.

Optionally, determining a data table of a second database for entering a data table of the plurality of first databases comprises:

determining the service classification information of the data tables of the plurality of first databases according to the information of the data tables of the plurality of first databases;

and determining the data table of the second database corresponding to the service classification information according to the service classification information of the data tables of the plurality of first databases.

According to still another aspect of an embodiment of the present disclosure, there is provided a batch data acquisition apparatus including:

a first database acquisition unit configured to acquire information of data tables of a plurality of first databases;

a second database determination unit configured to determine a data table of a second database for entering data tables of the plurality of first databases;

the acquisition strategy making unit is used for determining an acquisition strategy for acquiring the data table of the first database by the agent cluster;

and the acquisition task execution unit is used for indicating the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition strategy.

According to yet another aspect of embodiments of the present disclosure, there is provided a readable storage medium having executable instructions thereon that, when executed, cause a computing device to perform operations included in the batch data collection method described above.

According to yet another aspect of embodiments of the present disclosure, there is provided a computing device including: a processor; and a memory storing executable instructions that, when executed, cause the processor to perform the operations included in the batch data collection method.

In the embodiment of the disclosure, the collection tasks are automatically executed based on the uniformly configured collection strategy, so that the data tables of the databases from different sources are collected, and compared with a mode of manually configuring each collection task, the data collection efficiency is greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the principles of the disclosure.

FIG. 1 is a schematic block diagram of an exemplary computing device 100;

FIG. 2 is a schematic flow chart diagram of a batch data collection method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a batch data acquisition device according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

FIG. 1 is a block diagram of an example computing device 100 arranged to implement a batch data acquisition method according to the present disclosure. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: the processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. the example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more programs 122, and program data 124. In some implementations, the program 122 can be configured to execute instructions on an operating system by one or more processors 104 using program data 124.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display terminal or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 100 may be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations.

Among other things, one or more programs 122 of computing device 100 include instructions for performing a batch data collection method according to the present disclosure.

FIG. 2 illustrates a flow chart of a batch data collection method 200 according to the present disclosure, the method 200 beginning at step S210.

S210, acquiring information of data tables of a plurality of first databases;

s220, determining a data table of a second database for recording data tables of a plurality of first databases;

s230, determining an acquisition strategy for acquiring the data table of the first database by the agent cluster;

and S240, instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition strategy.

In step S210, the first database refers to databases of different information sources, for example, databases of hospitals in different regions. The acquired information of the data table of the first database comprises: information of the database, for example, a database access address, a database type; and data table information, e.g., name, range, field information of the data table.

In step S220, the determined data table of the second database for entering the data tables of the plurality of first databases is a data table of a local database, or a data table of a preset database of a storage cluster, so as to uniformly collect data from different sources to a designated database.

In step S230, the determined agent cluster acquires the acquisition policies of the data tables of the plurality of first databases, which are policies for all data table acquisition tasks, so as to implement unified management of the acquisition policies.

In step S240, instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition policy; through a uniform acquisition strategy, databases from different sources are uniformly acquired to an appointed database, all data acquisition tasks can be efficiently and automatically completed, independent configuration of each acquisition task is avoided, and data acquisition efficiency is improved.

Further, before step S240, the method further includes the steps of:

in step S240, instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the collection policy, including:

In the embodiment of the present disclosure, the mapping rule mainly includes two aspects: the method comprises the following steps of firstly, converting fields expressing the same meaning of databases from different sources into uniform fields; and the data mapping rule is used for processing the field data and uniformly converting the data of different database types (databases such as MySQL, MongoDB, Oracle and the like) and different parameter definitions into the data of the specified parameter types under the specified database types, so that the problem of inconsistent field definitions of different databases is solved.

Further, an embodiment of the present disclosure further provides an updating step of the mapping rule, including:

Specifically, the updated mapping rule is a data mapping rule, and the error of the data table of the second database is determined by performing character or numerical value verification on the same field of the second database and the same field of the first database, wherein the comparison algorithm is set according to data expression logic of the bottom layer of the database, so that the problems of messy codes, overflow, numerical value errors or character errors caused by the difference of data type definitions of different databases are avoided.

Optionally, the collection strategy comprises:

triggering time, execution period and validity period of each acquisition task;

By the acquisition strategy, the aim of automatically and periodically acquiring the data of the databases on time is fulfilled.

Optionally, the collecting strategy further comprises:

By the acquisition strategy, automatic acquisition fault management is realized, and when a network or a server fails, acquisition can be tried to be carried out again or terminated.

Optionally, the collecting strategy further comprises:

the starting type and priority of each acquisition task;

Through the acquisition strategy, priority management is carried out on a plurality of acquisition tasks, the acquisition task with high importance is set as high priority, and the acquisition task with low importance is set as low priority, so that the data acquisition utility is maximized under the non-ideal condition.

Optionally, step S220 specifically includes:

Specifically, the service classification information may be information items such as an attribution theme, an acquisition mode, an attribution department, an attribution system, and a source channel of an acquisition task, and is used to perform classification management on the acquired data.

According to the embodiment of the disclosure, the database is managed based on the service classification information, and the data table of the database is divided into different service classifications, so that the user can conveniently search and maintain the database.

Specific examples of the present invention are given below.

Firstly, configuring batch acquisition tasks by a user;

the user inputs the name of the collection task on a batch collection task configuration page: collecting tasks, numbering: 2019051015245410000, from the source database: digital _ china, target database: phedda, topic of attribution: outpatient service, home department: the test department and the collection method comprise the following steps: increment; the user can establish a plurality of collection task entries on the page, and after selecting the data table of the source database, the user can fill in the target table name, the table Chinese name and the collection mode after the data table.

Step two, scheduling task configuration is carried out;

and selecting an agent cluster for executing the acquisition task by the user on a scheduling task configuration interface, and filling the starting type: sequential start, priority: in the execution cycle: day, redo interval: 10 minutes, longest run length: 3 hours, trigger type: time-triggered, expiration date: 5/10/2019 to 5/10/2029, trigger time: 00:00, number of failed redos: 3, whether to perform the following steps: is.

Secondly, collecting task management;

the user can see the configured relevant information of each acquisition task in the acquisition task management interface, and can also perform operations of adding, suspending, releasing, canceling and the like of the tasks.

Referring to fig. 3, a batch data collecting apparatus 300 provided in an embodiment of the present disclosure includes:

a first database acquisition unit 310 configured to acquire information of data tables of a plurality of first databases;

a second database determination unit 320 for determining a data table of a second database for entering data tables of the plurality of first databases;

the acquisition strategy making unit 330 is configured to determine an acquisition strategy for acquiring the data table of the first database by the agent cluster;

and the collection task execution unit 340 is configured to instruct the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the collection policy.

Optionally, the batch data collecting apparatus 300 further includes:

the mapping rule management unit is used for determining the mapping rules of the data tables of the plurality of first databases and the data tables of the second database according to the information of the data tables of the plurality of first databases and the information of the data tables of the second database;

the collection task execution unit 340 is specifically configured to:

Optionally, the batch data collecting apparatus 300 further includes:

the verification unit is used for verifying the data table of the second database according to the data tables of the plurality of first databases;

Optionally, the acquisition policy formulated by the acquisition policy formulation unit 330 includes:

triggering time, execution period and validity period of each acquisition task;

and the effective period is used for indicating the time period for executing the acquisition task according to the execution cycle.

Optionally, the acquisition policy formulated by the acquisition policy formulation unit 330 further includes:

the starting type and priority of each acquisition task;

Optionally, the second database determining unit 320 is specifically configured to:

It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present disclosure, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the disclosure.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the various methods of the present disclosure according to instructions in the program code stored in the memory.

By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.

It should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the disclosure has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the disclosure as described herein. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present disclosure is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

Claims

1. A method for batch data acquisition, comprising:

acquiring information of data tables of a plurality of first databases;

determining an acquisition strategy for acquiring the data tables of the plurality of first databases by the agent cluster;

2. The method of claim 1, wherein instructing the agent cluster to write the data tables of the first plurality of databases to the data tables of the second database according to the collection policy further comprises:

instructing the agent cluster to write the data tables of the plurality of first databases into the data table of the second database according to the acquisition policy, including:

instructing the agent cluster to collect the data tables of the plurality of first databases according to the collection strategy;

3. The method of claim 2, further comprising:

4. The method of claim 1, wherein the acquisition strategy comprises:

triggering time, execution period and validity period of each acquisition task;

the execution period is used for indicating the period of executing the collection task by the agent cluster;

the validity period is used for indicating a time period for executing the acquisition task according to the execution cycle.

5. The method of claim 4, wherein the acquisition strategy further comprises:

the redo interval is a time interval for restarting the collection task when the agent cluster fails to execute any collection task;

the maximum running time length is used for indicating the maximum running time length of any acquisition task, and when the execution time length of any acquisition task is longer than the maximum running time length, the failure of any acquisition task is determined and any acquisition task is terminated;

the selection of whether to execute the data coverage is used for indicating whether the newly acquired data covers the acquired data after any acquisition task fails and the task is executed again.

6. The method of claim 5, wherein the acquisition strategy further comprises:

the starting type and priority of each acquisition task;

7. The method of claim 1, wherein determining a data table for a second database for entering data tables of the plurality of first databases comprises:

8. A batch data acquisition device, comprising:

a second database determination unit configured to determine a data table of a second database used for entering the data tables of the plurality of first databases;

the acquisition strategy making unit is used for determining an acquisition strategy for acquiring the data tables of the plurality of first databases by the agent cluster;

9. A readable storage medium having executable instructions thereon that, when executed, cause a computing device to perform the operations included in any of claims 1-7.

10. A computing device, comprising:

a processor; and

a memory storing executable instructions that, when executed, cause the processor to perform the operations included in any of claims 1-7.