CN111858387B - Data preparation method and device - Google Patents

Data preparation method and device Download PDF

Info

Publication number
CN111858387B
CN111858387B CN202010778835.5A CN202010778835A CN111858387B CN 111858387 B CN111858387 B CN 111858387B CN 202010778835 A CN202010778835 A CN 202010778835A CN 111858387 B CN111858387 B CN 111858387B
Authority
CN
China
Prior art keywords
instruction
execution
scheme
data
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010778835.5A
Other languages
Chinese (zh)
Other versions
CN111858387A (en
Inventor
崔东晓
张林林
高宏波
陈耀文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010778835.5A priority Critical patent/CN111858387B/en
Publication of CN111858387A publication Critical patent/CN111858387A/en
Application granted granted Critical
Publication of CN111858387B publication Critical patent/CN111858387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3696Methods or tools to render software testable
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of big data, and provides a data processing method and device. The data processing method comprises the following steps: acquiring a data preparation task; retrieving a prior execution record of the data preparation task, wherein the execution record includes a schema of the data preparation task; acquiring a scheme of the data preparation task, wherein the scheme in the execution record is called as an execution scheme when the data preparation task of the execution record is retrieved; the data preparation task of the execution record is not retrieved, and a scheme edited by a user is obtained as an execution scheme; and selecting a corresponding execution mode according to the scheme of the data preparation task to finish data preparation. According to the data processing method and device provided by the embodiment of the invention, the repetitive work in the data preparation process can be reduced, and the data preparation efficiency is improved.

Description

Data preparation method and device
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a data preparation method and apparatus.
Background
Testing needs to be performed in a test environment, and data in a production environment needs to be synchronized into the test environment for preparation of test data at each test.
The existing data preparation method has a longer preparation period, and the complete data preparation process needs to be carried out again before each test, so that the test efficiency is affected. On the other hand, the preparation flow also has a large number of manually executed steps, so that the data preparation requirement of a large-scale test item is difficult to meet.
Disclosure of Invention
The present invention has been made in view of the above problems, and has as its object to provide a data preparation method and apparatus based on the big data field which overcomes or at least partially solves the above problems.
According to an aspect of an embodiment of the present invention, there is provided a data preparation method including: acquiring a data preparation task; retrieving a prior execution record of the data preparation task, wherein the execution record includes a schema of the data preparation task; acquiring an execution scheme of the data preparation task, wherein the data preparation task of the execution record is retrieved, and a scheme in the execution record is called as the execution scheme; the data preparation task of the execution record is not retrieved, and a scheme edited by a user is obtained as the execution scheme; and selecting a corresponding execution mode according to the execution scheme of the data preparation task to finish data preparation.
Optionally, invoking the scheme in the execution record as the execution scheme includes: judging whether the scheme in the execution record can be executed or not; when the scheme is determined to be executable, calling the scheme as the execution scheme; and when the scheme can not be executed, modifying the scheme to enable the scheme to be executed, calling the modified scheme as the executing scheme, and replacing the scheme in the executing record with the modified scheme.
Optionally, the method further comprises: and establishing an execution record of the data preparation task according to the scheme edited by the user.
Optionally, the method further comprises: preprocessing the execution scheme to obtain instructions included in the execution scheme.
Optionally, the method further comprises: all data acquired by the data preparation task are converted into a unified format.
Optionally, selecting a corresponding execution mode according to the execution scheme of the data preparation task includes: receiving the execution scheme; judging whether the execution scheme comprises a database instruction or not, wherein the database instruction comprises a backup instruction and a recovery instruction; the execution scheme including the database instructions is executed in a database manner; the execution scheme that does not include the database instructions is executed in a processor manner.
Optionally, the performing in a database manner includes: judging the category of the database instruction; when the database instruction is determined to be a recovery instruction, judging whether the data pointed by the database instruction can be recovered or not; when the data cannot be recovered, feeding back execution error information; when the data can be recovered, generating a first instruction and executing the first instruction, wherein the first instruction is used for recovering the data pointed by the database instruction; when the database instruction is determined to be a backup instruction, judging the total data pointed by the database instruction; when the total data amount is larger than a preset threshold value, connecting to a server for execution; and when the total data amount is smaller than or equal to a preset threshold value, generating a second instruction according to the database instruction and executing the second instruction, wherein the second instruction is used for searching the data pointed by the database instruction and backing up the data pointed by the database instruction.
Optionally, generating a second instruction according to the database instruction and executing includes: judging whether the data corresponding to the data pointed by the database instruction in the existing data is required to be covered according to the content of the database instruction; generating and executing the second instruction when the coverage is not needed; and when the coverage is determined to be needed, generating a third instruction and executing the third instruction, wherein the third instruction is used for deleting the data corresponding to the data pointed by the database instruction in the existing data, and generating and executing the second instruction after the third instruction is executed.
According to another aspect of an embodiment of the present invention, there is provided an apparatus for data preparation, including: the acquisition module is used for acquiring a data preparation task; the searching module is used for searching a previous execution record of the data preparation task, wherein the execution record comprises a scheme of the data preparation task; the scheme obtaining module is used for obtaining an execution scheme of the data preparation task and comprises the following steps: a first unit configured to call, for the data preparation task for which the execution record is retrieved, a scheme in the execution record as the execution scheme; a second unit configured to obtain, as the execution scheme, a scheme edited by a user for the data preparation task for which the execution record is not retrieved; and the execution module is used for selecting a corresponding execution mode according to the execution scheme of each data preparation task to finish data preparation.
According to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; and a storage means for storing one or more programs, which when executed by the one or more processors cause the one or more processors to perform the method as claimed in any preceding claim.
According to yet another aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform a method as described in any of the above.
According to the data preparation method and device provided by the embodiment of the invention, repeated work in the data preparation process can be reduced, and the data preparation efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of a data preparation method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a data preparation method according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a scheme in a call execution record according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an implementation selection according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a database implementation according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a backup instruction execution according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a data preparing apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic view of a usage scenario of a data preparing apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of an electronic device according to an embodiment of the invention;
fig. 10 is a schematic diagram of a computer-readable storage medium according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Furthermore, in the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details.
First, according to an embodiment of the present invention, there is provided a data preparing method, referring to fig. 1, including:
step S102: a data preparation task is acquired.
In step S102, after the data preparation task application is obtained, information of all the data preparation tasks may be stored in the database table, so that subsequent calls are facilitated.
Sources of data preparation tasks may include: receiving a data preparation task directly input by a user, receiving a data preparation task sent by a server, and the like.
It will be appreciated that the data preparation tasks are generated according to specific test items, and a test item is usually required to generate a large number of data preparation tasks, that is, one or more data preparation tasks are usually acquired in the data preparation tasks, in the case of having a plurality of data preparation tasks, each data preparation task may be executed according to the following steps, and the data preparation tasks may be processed in parallel, and a person skilled in the art may select a suitable parallel processing manner to schedule the data preparation tasks to improve the overall completion efficiency of the data preparation tasks.
Step S104: the search data prepares a record of the previous execution of the task.
In step S104, the execution record of each data preparation task is retrieved, where the execution record includes a scheme of the data preparation task, and a person skilled in the art may select an appropriate method according to the actual situation to select how to obtain the execution record of the data preparation task and retrieve the execution record, for example, establish a database of the execution record, where the database of the execution record may be established in a local storage center or in a database platform of the cloud, which is not limited specifically.
The execution records in the execution record database can be of various sources, for example, the execution records of the data preparation tasks which are executed in a history mode can be stored, the execution records of the data preparation tasks which are acquired from various types of other databases can be also obtained, and a person skilled in the art can reasonably expand the execution records in the execution record database by utilizing other technical means, so that more schemes of the data preparation tasks with repeated properties can be acquired more efficiently.
The method for searching the execution record may be that after the basic attribute information of the data preparation task is obtained first, the execution record is searched in a database of the execution record, and the search may be an accurate search, that is, specific information in the basic attribute of the data preparation task is searched, so that the searched execution record and the data preparation task of this time can be ensured to have higher adaptation degree, and the search may also be a fuzzy search and a range search, that is, the search may also be performed simultaneously according to the task type in the basic attribute of the database preparation task, and the like, so that the searched execution record is the execution record associated with the data preparation task, and in some cases, a plurality of execution records may be searched.
Step S106: an execution scheme of the data preparation task is acquired.
In step S106, the scheme described in the execution record may be selected to be called as the execution scheme for the data preparation task for which the previous execution record is retrieved. For data preparation tasks for which no previous execution record is retrieved, a user-edited scheme is acquired as an execution scheme.
Specifically, when the data preparation task retrieves the corresponding execution record, the scheme in the execution record can be selected to be directly called as the execution scheme, and when the data preparation task does not retrieve the corresponding execution record, the scheme edited by the user is acquired as the execution scheme.
The method for obtaining the user-edited scheme may be to provide a scheme editing interface to receive the user-input scheme, or may provide some alternative schemes according to basic attributes of specific data preparation tasks based on preset rules, and receive a selection instruction of the user, where the user may directly select some schemes, or may perform secondary editing after selecting the scheme, or may give up selecting to directly perform custom editing, and a person skilled in the art may select a suitable manner for receiving the user-edited scheme in this step to obtain the user-edited scheme, which is not limited herein.
It can be understood that the previous execution record is retrieved, which means that the data preparation task is executed once, or the data task of the same type as the data preparation task is executed once, and the scheme in the execution record is called as the execution scheme, so that the number of the data preparation tasks which need the user to edit the scheme by himself can be reduced as much as possible, the repetitive work is avoided, and the data preparation efficiency is improved.
Step S108: and selecting an execution mode according to the execution scheme to finish data preparation.
After the execution scheme of the data preparation task is obtained, a proper execution mode is selected according to the content recorded in the specific execution scheme to execute the data preparation task, the data obtained after the execution of the data preparation task is completed can be stored in a local storage center or a cloud storage center, and a person skilled in the art can select the storage mode of the data obtained by the data preparation task according to the actual situation, so that the method is not particularly limited, and the data preparation is completed after all the data preparation tasks are completed.
In some embodiments, the above steps may still be applicable when multiple data preparation tasks are acquired. For example, a plurality of data preparation tasks may be processed in parallel in the following manner.
For example, a plurality of data preparation tasks may be acquired. For example, 20 data preparation tasks are acquired in this step.
Next, a previously executed record of each of the plurality of data preparation tasks is retrieved.
And acquiring respective execution schemes of the plurality of data preparation tasks, wherein the data preparation tasks of the execution record are retrieved in the plurality of data preparation tasks, the schemes in the execution record are called as the execution schemes, the data preparation tasks of the execution record are not retrieved in the plurality of data preparation tasks, and the schemes edited by the user are acquired as the execution schemes.
For example, if 15 execution records of the data preparation tasks are retrieved in the previous step, the 15 data preparation tasks may directly call the scheme in the execution records as the execution scheme, while the remaining 5 data preparation tasks for which the execution records are not retrieved, the scheme edited by the user is acquired as the execution scheme.
And then selecting an execution mode according to the execution scheme of each of the plurality of data preparation tasks to finish data preparation.
It will be appreciated that in the above example, each execution scheme of 20 data preparation tasks includes 15 execution schemes acquired by calling the scheme in the execution record, and 5 execution schemes acquired by acquiring the scheme edited by the user.
Other reasonable task parallel processing methods can also be used by those skilled in the art to process data preparation tasks in parallel. For example, when 1 data preparation task has acquired the execution scheme, the execution mode can be selected for execution without waiting for all data preparation tasks to acquire the execution scheme.
Further, it will be appreciated that although the data preparation method according to the embodiment of the present invention is proposed for data preparation of test tasks, it may be applied to virtually any data preparation scenario requiring preparation of a large amount of data, such as data preparation in a test set, training set, etc. before machine learning model construction, and having a certain repetitive nature, where the repetitive nature means that some specific data preparation tasks may be repeatedly performed in a plurality of batches of data preparation work.
In some embodiments, the method establishes a record of the execution of the data preparation task according to a user-edited scheme.
Specifically, referring to steps S201 to S207 in fig. 2, after acquiring a data preparation task and retrieving a previous execution record, for a data preparation task for which an execution record is not retrieved, the execution record may be established after acquiring a scenario edited by the user.
It will be appreciated that the execution record created in step S206 is the same as the previous execution record and will be stored in the same execution record database, so that after the data preparation task is acquired again later, the execution record in the execution record database can be directly called. In such an embodiment, the execution record database will be continuously expanded, so that in the subsequent data preparation work, more and more data preparation tasks can quickly acquire the scheme from the execution record, thereby forming a benign loop. In some embodiments, the execution record may be built after the final execution of the solution of the data preparation task is completed, so as to further ensure the executable performance of the solution, and avoid the situation that the solution cannot be executed in the execution record as much as possible.
In some embodiments, invoking the scheme in the execution record as the execution scheme comprises: judging whether the scheme in the execution record can be executed or not; when the scheme is determined to be executable, calling the scheme as an execution scheme; and when the scheme is determined to be incapable of being executed, modifying the scheme to enable the scheme to be executed, calling the modified scheme as the execution scheme, and replacing the scheme in the execution record with the modified scheme.
Specifically, referring to steps S301 to S305 in fig. 3, when the previous execution record is retrieved, it is necessary to perform the judgment in step S302 to determine whether or not the program in the execution record is executable, and the judgment may be performed manually, or may be performed by a judgment algorithm, a machine learning model, or the like by setting a judgment rule in advance, and the present invention is not limited thereto. Whether the solution can be executed in this step includes determining whether the execution mode corresponding to the solution can be executed, and if some of the processors, servers, data, etc. needed in the execution mode corresponding to the solution are missing, the solution cannot be executed, and determining that the solution can obtain the expected data after execution, that is, determining whether the solution can effectively complete the data preparation task.
In step S302, if it is determined that the solution can be executed, the solution is directly called as an execution solution, if it is determined that the solution cannot be executed, a modification solution is required, and similarly, the modification may be performed manually, or for a specific reason that cannot be executed obtained in the determination process, automatic repair may be performed using an algorithm or the like, so that the modified solution can be executed, and the modified solution is called as an execution solution. Further, it is also necessary to replace the original scheme in the execution record with a modified scheme, so that the scheme can be directly called as much as possible when the data preparation task is encountered again in the subsequent data preparation work.
It will be appreciated that the above steps of determining and modifying the scheme, even if all are done manually, can save a significant amount of time compared to the prior art where all schemes of the data preparation task were edited manually. The steps of determining and modifying in this embodiment may also be performed by those skilled in the art using the above-mentioned algorithms, machine learning, etc. methods or any other suitable technical means to further reduce the use of manpower.
In some embodiments, the method further comprises preprocessing the execution scheme after the execution scheme is acquired to extract instructions included in the execution scheme. Those skilled in the art will appreciate that where there are multiple data preparation tasks, the preprocessing may be performed on multiple execution schemes. The preprocessing of the execution schemes can be that the execution schemes are formatted into json objects and then instructions in the json objects are extracted, and the processing mode enables a plurality of data preparation tasks to have a uniform format when in parallel processing, so that the corresponding execution mode can be conveniently selected when the execution is carried out later.
In some embodiments, after the data preparation task is performed to obtain the data, all the data may be formatted and converted into a uniform format to facilitate subsequent testing or other work.
In some embodiments, in order to better perform the step of preprocessing the execution scheme and the step of formatting the data, a kafka message queue method may be used, an uplink Topic and a downlink Topic are established, a message in the uplink Topic is an execution scheme, after the message is formatted into a json format, an instruction is packaged and sent to a database or a processor according to an execution mode corresponding to the execution scheme, and a sending format may be selected. Similarly, data acquired by the data preparation task is received in the form of Http, and is put into downlink Topic after being subjected to uniform formatting. It will be appreciated that other distributed message queues may also be selected to accomplish parallel processing in the event that multiple data preparation tasks are acquired, such as ActiveMQ, rabbitMQ, zeroMQ and the like.
In some embodiments, referring to steps S401 to S404 in fig. 4, selecting a corresponding execution mode according to an execution scheme includes:
after receiving the execution scheme, firstly judging whether the execution scheme comprises a database instruction, selecting a database mode for executing the execution scheme comprising the database instruction, and executing the execution scheme not comprising the database instruction by using a processor mode.
In particular, the database instructions generally include a backup instruction and a restore instruction, and in practical applications, data to be backed up or restored may be located in different database platforms, where the database platforms include, but are not limited to, a host DB2 database, a MySql database, an Oracle database, and so on, so that the backup instruction and the restore instruction generally further include database type information, and when the database mode is selected for execution, the backup instruction or the restore instruction selects the corresponding database platform for processing.
The backup instruction and the recovery instruction belong to operations performed on the database platform, and for specific data execution tasks, if backup is already performed once, backup can be selected and recovery can be selected when the data is executed again later, so that according to the situation in actual work, the backup instruction or the recovery instruction can be written in an execution scheme by self-selection for the database preparation tasks which can be selected and recovered, that is, when the execution scheme is acquired or when the execution scheme is preprocessed and extracted, the backup instruction or the recovery instruction in the execution scheme can be selected, so that corresponding resources can be reasonably allocated when the data preparation tasks are executed in a database mode, and the efficiency of data preparation can be further improved.
The use of a processor mode means that the data cannot be directly obtained from the database, and the corresponding data needs to be obtained through script simulation production, and a specific processor mode can be selected by a person skilled in the art according to the actual application field, for example, when the processor mode is used for preparing test data in the banking field, the processor mode is used for preparing the test data by simulating the operation of a production environment client through a script.
In some embodiments, referring to steps S501 to S504 in fig. 5, specific steps performed using the database method include:
after the database instruction is acquired, the category of the database instruction is determined.
For the recovery instruction, it is necessary to determine whether the data pointed by the database instruction can be recovered, and if so, generate and execute a first instruction for recovering the data, and if not, feed back execution error information.
For the backup instruction, the total data pointed by the database instruction needs to be judged, and when the total data is larger than a preset threshold value, the database instruction is connected to a server for execution; and when the total data amount is smaller than or equal to a preset threshold value, generating a second instruction according to the database instruction and executing the second instruction, wherein the second instruction is used for searching the data pointed by the database instruction and backing up the data pointed by the database instruction.
The above steps will be described in detail below using the host DB2 database as an example, and it will be understood that the same steps may be used for other databases such as MySq1 database and Oracle database, and only the adaptation of some specific instructions may be required according to the rules of the specific database.
For the recovery instruction, whether the data pointed by the database instruction can be recovered is finished by checking whether the recovery file in the database exists or not, if the corresponding recovery file is not detected, the execution error information is fed back, in some embodiments, the execution error information can be manually processed, an execution scheme is modified or re-edited to select other executable execution schemes to finish the data preparation task, in some embodiments, an algorithm or the like can be used to analyze the execution error information and perform proper remediation processing.
If the execution is judged to be capable, generating a first instruction and executing the first instruction, wherein the first instruction can select a Unload instruction or an Insert instruction, and for the Unload instruction, a corresponding host DB2 Unload job is generated when the execution is performed, and the corresponding job is executed by connecting to a host. And for the Insert instruction, establishing connection with the DB2 database through the JBdc, generating the recovery SQL to be executed, and executing the recovery SQL.
For the backup instruction, it is necessary to determine the total amount of data pointed by the database instruction, and if the total amount of data exceeds a preset threshold, a host DB2 load job is generated and connected to the host to execute the corresponding job. If the total data amount exceeds a preset threshold, generating a second instruction and executing the second instruction, wherein the preset threshold can be 1w or other values determined according to actual conditions.
The specific steps for generating the second instruction comprise: firstly, all fields and corresponding field types of data to be backed up are acquired, and corresponding query sentences are respectively generated according to characters/non-characters/EBCDIC for different field types.
Through the steps, a plurality of execution modes are provided in the database platform, reasonable allocation of corresponding resources is facilitated, and a person skilled in the art can set selection rules of the Unload instruction and the Insert instruction according to actual conditions and adjust a preset threshold value to reasonably allocate the number of tasks executed by a host and tasks executed by a query statement generation, so that various resources are utilized to the maximum extent, and the working efficiency is further improved.
In some embodiments, referring to steps S601 to 606 in fig. 6, when the total amount of data is less than the preset threshold, the steps of generating and executing the second instruction further include: judging whether the data corresponding to the data pointed by the database instruction in the existing data is required to be covered according to the content of the database instruction; when it is determined that coverage is not needed, generating the second instruction and executing the second instruction; and when the coverage is determined to be needed, generating a third instruction and executing the third instruction, wherein the third instruction is used for deleting the data corresponding to the data pointed by the database instruction in the existing data, and after the third instruction is executed, generating the second instruction and executing the second instruction.
It will be appreciated that in practice, it may be preset that all data is or is not overwritten, but in some embodiments, some of the original important data may be lost, so that in some embodiments, the step of determining whether existing data needs to be overwritten is added.
Still referring to the host DB2 database as an example, when the overlay recovery is required, the steps of generating and executing the third instruction include: firstly, obtaining index information of data to be backed up according to DB2 category, obtaining a unique index of the index information, obtaining all fields of the unique index and values of each piece of data according to the scope of data backup, generating a Delete statement with the unique index as a condition according to the information, and executing the Delete statement.
The steps for generating and executing the second instruction that are performed next are referred to the foregoing, and will not be described herein again, and when the second instruction is not required to be overlaid, the second instruction is directly generated and executed.
The instructions in the specific steps exemplified by the host DB2 can be applied to the MySql database and the Oracle database only by adaptively modifying, for example, in the MySql database, the MySql dump instruction or the Insert instruction can be selected to complete data recovery, and in the Oracle, the impdp instruction or the Insert instruction can be selected to complete data recovery.
There is also provided, in accordance with an embodiment of the present invention, an apparatus 100 for data preparation, referring to fig. 7, including:
an acquisition module 10 for acquiring a data preparation task;
a retrieving module 20, configured to retrieve a previous execution record of the data preparation task, where the execution record includes a scheme of the data preparation task;
a solution obtaining module 30, configured to obtain an execution solution of the data preparation task, including: a first unit 31 for calling a scheme in the execution record as the execution scheme for the data preparation task for which the execution record is retrieved; a second unit 32 for acquiring a user-edited scheme as an execution scheme for the data preparation task for which the execution record is not retrieved;
and the execution module 40 is configured to select a corresponding execution mode according to the execution scheme of each data preparation task to complete data preparation.
In some embodiments, the first unit 31 is further configured to: judging whether the scheme in the execution record can be executed or not; when the scheme is determined to be executable, calling the scheme as the execution scheme; and when the scheme can not be executed, modifying the scheme to enable the scheme to be executed, calling the modified scheme as the executing scheme, and replacing the scheme in the executing record with the modified scheme.
In some embodiments, the second unit 32 is further configured to establish an execution record of the data preparation task according to the execution scheme edited by the user.
In some embodiments, the apparatus further comprises a first processing module 50 for preprocessing the execution scheme to obtain instructions included in the execution scheme.
In some embodiments, the apparatus further comprises a second processing module 60 for converting all data acquired by the data preparation task into a unified format.
In some embodiments, the execution module 40 comprises an allocation unit 41 for receiving the execution scheme; judging whether the execution scheme comprises a database instruction or not, wherein the database instruction comprises a backup instruction and a recovery instruction; the scheme including the database instructions is executed in a database manner; the scheme excluding the database instructions is executed in a processor manner.
In some embodiments, the execution module 40 further includes a first determination unit 42 for determining a category of the database instruction; when the database instruction is determined to be a recovery instruction, judging whether the data pointed by the database instruction can be recovered or not; when the data cannot be recovered, feeding back execution error information; when the data can be recovered, generating a first instruction and executing the first instruction, wherein the first instruction is used for recovering the data pointed by the database instruction; when the database instruction is determined to be a backup instruction, judging the total data pointed by the database instruction; when the total data amount is larger than a preset threshold value, connecting to a server for execution; and when the total data amount is smaller than or equal to a preset threshold value, generating a second instruction according to the database instruction and executing the second instruction, wherein the second instruction is used for searching the data pointed by the database instruction and backing up the data pointed by the database instruction.
In some embodiments, the execution module further includes a second determining unit 43, configured to determine, according to the content of the database instruction, whether to overwrite data corresponding to the data pointed by the database instruction in the existing data; generating and executing the second instruction when the coverage is not needed; and when the coverage is determined to be needed, generating a third instruction and executing the third instruction, wherein the third instruction is used for deleting the data corresponding to the data pointed by the database instruction in the existing data, and generating and executing the second instruction after the third instruction is executed.
It will be appreciated that the specific method of each module in the apparatus 100 when performing the corresponding function may refer to the data preparation method described above, and will not be described herein.
Fig. 8 shows an application scenario of a device for data preparation according to an embodiment of the present invention, where a user may select a terminal, such as a mobile phone, a tablet, a notebook, or a local server, a cloud server, or the like, to operate as a main operation terminal.
The execution record database can be a database from the cloud, or can be from a local storage of the operation terminal, and the operation terminal can send the data preparation task executed by using the database mode to the database of the cloud to complete the task. For the data preparation task executed by the processor mode, the operation interrupt can be completed by using a cloud processor to complete corresponding script execution and other operations, and can also be completed by using a local processor.
There is also provided, in accordance with an embodiment of the present invention, an electronic device, referring to fig. 9, including: one or more processors; and a storage means for storing one or more programs, which when executed by the one or more processors cause the one or more processors to perform the method as claimed in any preceding claim.
There is also provided, in accordance with an embodiment of the present invention, a computer-readable storage medium, with reference to fig. 10, having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method as described in any of the above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", and "a third" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In addition, the descriptions of "greater than a preset threshold", "less than or equal to a preset threshold", etc. are only used to indicate that a preset threshold exists, and are used to determine whether some characteristics of the numerical value meet expectations, and according to different specific calculation formulas in practical applications, symbols such as "greater than", "less than or equal to" may be changed accordingly according to the wish that the symbols should be expressed in practice.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
While the embodiments of the present invention have been illustrated and described, it will be appreciated that what is disclosed is merely a preferred embodiment of the invention and, of course, not as a definition of the limits of the invention, and equivalent variations on the appended claims are therefore intended to be encompassed by the present invention.

Claims (9)

1. A data preparation method, comprising:
acquiring a data preparation task;
retrieving a prior execution record of the data preparation task, wherein the execution record includes a schema of the data preparation task;
acquiring an execution scheme of the data preparation task, wherein the scheme in the execution record is called as the execution scheme for the data preparation task of which the execution record is retrieved; obtaining a scheme edited by a user as the execution scheme for the data preparation task for which the execution record is not retrieved;
selecting a corresponding execution mode to finish data preparation according to the execution scheme of the data preparation task,
wherein the execution scheme is received;
judging whether the execution scheme comprises a database instruction or not, wherein the database instruction comprises a backup instruction and a recovery instruction;
the execution scheme including the database instructions is executed in a database manner,
judging the category of the database instruction;
when the database instruction is determined to be a recovery instruction, judging whether the data pointed by the database instruction can be recovered or not; when the data cannot be recovered, feeding back execution error information; when the data can be recovered, generating a first instruction and executing the first instruction, wherein the first instruction is used for recovering the data pointed by the database instruction;
when the database instruction is determined to be a backup instruction, judging the total data pointed by the database instruction; when the total data amount is larger than a preset threshold value, connecting to a server for execution; when the total data amount is smaller than or equal to a preset threshold value, generating a second instruction according to the database instruction and executing the second instruction, wherein the second instruction is used for searching the data pointed by the database instruction and backing up the data pointed by the database instruction;
the execution scheme that does not include the database instructions is executed in a processor manner.
2. The data preparation method according to claim 1, wherein invoking a scheme in the execution record as the execution scheme comprises:
judging whether the scheme in the execution record can be executed or not;
when the scheme is determined to be executable, calling the scheme as the execution scheme;
and when the scheme can not be executed, modifying the scheme to enable the scheme to be executed, calling the modified scheme as the executing scheme, and replacing the scheme in the executing record with the modified scheme.
3. The data preparation method according to claim 1, further comprising: and establishing an execution record of the data preparation task according to the scheme edited by the user.
4. The data preparation method according to claim 1, further comprising: preprocessing the execution scheme to obtain instructions included in the execution scheme.
5. The data preparation method according to claim 1, further comprising: all data acquired by the data preparation task are converted into a unified format.
6. The method of claim 1, wherein generating a second instruction from the database instruction and executing comprises:
judging whether the data corresponding to the data pointed by the database instruction in the existing data is required to be covered according to the content of the database instruction;
generating and executing the second instruction when the coverage is not needed;
and when the coverage is determined, generating a third instruction and executing the third instruction, wherein the third instruction is used for deleting the data corresponding to the data pointed by the database instruction in the existing data, and after the third instruction is executed, generating the second instruction and executing the second instruction.
7. An apparatus for data preparation, comprising:
the acquisition module is used for acquiring a data preparation task;
the searching module is used for searching a previous execution record of the data preparation task, wherein the execution record comprises a scheme of the data preparation task;
the scheme obtaining module is used for obtaining an execution scheme of the data preparation task and comprises the following steps: a first unit configured to call, for the data preparation task for which the execution record is retrieved, a scheme in the execution record as the execution scheme; a second unit configured to obtain, as the execution scheme, a scheme edited by a user for the data preparation task for which the execution record is not retrieved;
the execution module is used for selecting a corresponding execution mode to finish data preparation according to the execution scheme of each data preparation task, and comprises the following steps: an allocation unit for receiving the execution scheme; judging whether the execution scheme comprises a database instruction or not, wherein the database instruction comprises a backup instruction and a recovery instruction; the execution scheme including the database instructions is executed in a database manner; the execution scheme excluding the database instructions is executed in a processor manner; a first determining unit, configured to determine a class of the database instruction; when the database instruction is determined to be a recovery instruction, judging whether the data pointed by the database instruction can be recovered or not; when the data cannot be recovered, feeding back execution error information; when the data can be recovered, generating a first instruction and executing the first instruction, wherein the first instruction is used for recovering the data pointed by the database instruction; when the database instruction is determined to be a backup instruction, judging the total data pointed by the database instruction; when the total data amount is larger than a preset threshold value, connecting to a server for execution; and when the total data amount is smaller than or equal to a preset threshold value, generating a second instruction according to the database instruction and executing the second instruction, wherein the second instruction is used for searching the data pointed by the database instruction and backing up the data pointed by the database instruction.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-6.
CN202010778835.5A 2020-08-05 2020-08-05 Data preparation method and device Active CN111858387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010778835.5A CN111858387B (en) 2020-08-05 2020-08-05 Data preparation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010778835.5A CN111858387B (en) 2020-08-05 2020-08-05 Data preparation method and device

Publications (2)

Publication Number Publication Date
CN111858387A CN111858387A (en) 2020-10-30
CN111858387B true CN111858387B (en) 2023-08-15

Family

ID=72971439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010778835.5A Active CN111858387B (en) 2020-08-05 2020-08-05 Data preparation method and device

Country Status (1)

Country Link
CN (1) CN111858387B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955738A (en) * 2012-11-13 2013-03-06 中原工学院 Driving and driven hybrid testing method for simulation system
CN107943694A (en) * 2017-11-21 2018-04-20 中国农业银行股份有限公司 A kind of test data generating method and device
CN108304569A (en) * 2018-02-13 2018-07-20 中国银行股份有限公司 A kind of test data accumulation method and device
CN110888797A (en) * 2019-10-11 2020-03-17 平安信托有限责任公司 Test data generation method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10361944B2 (en) * 2015-04-08 2019-07-23 Oracle International Corporation Automated test for uniform web service interfaces

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955738A (en) * 2012-11-13 2013-03-06 中原工学院 Driving and driven hybrid testing method for simulation system
CN107943694A (en) * 2017-11-21 2018-04-20 中国农业银行股份有限公司 A kind of test data generating method and device
CN108304569A (en) * 2018-02-13 2018-07-20 中国银行股份有限公司 A kind of test data accumulation method and device
CN110888797A (en) * 2019-10-11 2020-03-17 平安信托有限责任公司 Test data generation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111858387A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN108876121B (en) Work order processing method and device, computer equipment and storage medium
CN102236672A (en) Method and device for importing data
CN107301214B (en) Data migration method and device in HIVE and terminal equipment
US9727209B2 (en) Hierarchical data structure with shortcut list
CN109740129B (en) Report generation method, device and equipment based on blockchain and readable storage medium
CN110597821B (en) Method and device for changing database table structure
CN112256318B (en) Construction method and equipment for dependent product
US10437816B2 (en) Method and apparatus for reconstructing cube in multidimensional online analytical processing system
CN114969085A (en) Method and system for algorithm modeling based on visualization technology
CN110555185A (en) Page customization method and system based on PC client
CN111680478B (en) Report generation method, device, equipment and storage medium based on configuration software
CN111858366B (en) Test case generation method, device, equipment and storage medium
CN111858387B (en) Data preparation method and device
CN112800371A (en) Method and device for processing spreadsheet data in web page
CN109032940B (en) Test scene input method, device, equipment and storage medium
US7603394B2 (en) Client-server system, a generator unit and a method to customize a user application
LU505740B1 (en) Data monitoring method and system
CN113641670B (en) Data storage and data retrieval method and device, electronic equipment and storage medium
JP4630489B2 (en) Log comparison debugging support apparatus, method and program
CN114895955A (en) Method, device and equipment for controlling metadata version of low-code platform
CN111160403B (en) API (application program interface) multiplexing discovery method and device
CN110275865B (en) File storage optimization method and device
CN110609990B (en) Method and system for editing structured data text based on artificial intelligence
CN115639955A (en) Data storage method, device, equipment and storage medium
CN116756022A (en) Data preparation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant