CN110399209A - Data processing method, system, electronic equipment and storage medium - Google Patents

Data processing method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN110399209A
CN110399209A CN201910688165.5A CN201910688165A CN110399209A CN 110399209 A CN110399209 A CN 110399209A CN 201910688165 A CN201910688165 A CN 201910688165A CN 110399209 A CN110399209 A CN 110399209A
Authority
CN
China
Prior art keywords
data
cluster device
sandbox area
source
source cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910688165.5A
Other languages
Chinese (zh)
Other versions
CN110399209B (en
Inventor
张世瑛
曹伟
梁杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201910688165.5A priority Critical patent/CN110399209B/en
Publication of CN110399209A publication Critical patent/CN110399209A/en
Application granted granted Critical
Publication of CN110399209B publication Critical patent/CN110399209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Present disclose provides a kind of data processing methods, applied to controlling equipment, the described method includes: obtaining configuration information, wherein, the configuration information includes the object information and sampling prescription of target object to be sampled, the target object is stored in source cluster device, and the source cluster device includes mutually independent sandbox area and non-sandbox area, and the target object is stored in the non-sandbox area;Based on the sampling prescription and the object information, control instruction is generated;And control instruction is sent to source cluster device, so that the source cluster device is sampled the source data in the target object, the data from the sample survey that sampling obtains is stored and copies to target cluster device from the sandbox area to the sandbox area of the source cluster device, and by the data from the sample survey.The disclosure additionally provides a kind of data processing system, electronic equipment and computer readable storage medium.

Description

Data processing method, system, electronic equipment and storage medium
Technical field
This disclosure relates to field of computer technology, more particularly, to a kind of data processing method, system, electronic equipment And storage medium.
Background technique
The data preparation of big data platform is usually that the data in the cluster of source are copied to target cluster.In the prior art, The operating process that data in the cluster of source are copied to target cluster is more complicated, and process flow is longer, and needs a large amount of Hardware resource, Internet resources etc..
In the prior art, there are the process flow of Data Preparation Process is longer, the required time is long, and needs a large amount of Hardware resource, Internet resources the problems such as.
Summary of the invention
In view of this, present disclose provides a kind of data processing method, system, electronic equipment and storage mediums.
An aspect of this disclosure provides a kind of data processing method, is applied to controlling equipment, method includes: that acquisition is matched Confidence breath, wherein configuration information includes the object information and sampling prescription of target object to be sampled, and target object is stored in source Cluster device, source cluster device include mutually independent sandbox area and non-sandbox area, and target object is stored in non-sandbox area;It is based on Sampling prescription and object information generate control instruction;And control instruction is sent to source cluster device, so that source cluster device pair Source data in target object is sampled, and the data from the sample survey that sampling is obtained is stored to the sandbox area of source cluster device, and will Data from the sample survey copies to target cluster device from sandbox area.
In accordance with an embodiment of the present disclosure, configuration information further includes desensitization configuration;Based on sampling prescription and object information, generate Control instruction includes: to determine the metadata of target object based on object information;Based on metadata, tables of data is established;According to described Desensitization configuration determines that desensitization function, desensitization function are used to carry out data desensitization to data from the sample survey;And according to sampling prescription, data Table and desensitization function, generate control instruction.
In accordance with an embodiment of the present disclosure, control instruction is generated to execute following operation: being believed according to sampling prescription and object Breath, obtains data from the sample survey from target object;Data desensitization is carried out to data from the sample survey, obtains desensitization data;Desensitization data are protected It is stored to sandbox area;And desensitization data are copied into target cluster device from sandbox area.
In accordance with an embodiment of the present disclosure, the method also includes there are multiple controls for executing different task respectively In the case where instruction, concurrent configuration parameter is obtained;Based on the concurrent configuration parameter, determine that the source cluster device is performed simultaneously Task task quantity;And it is based on the task quantity, it controls the source cluster device and executes multiple control instructions.
In accordance with an embodiment of the present disclosure, task based access control quantity, it includes: to obtain that voltage input cluster device, which executes multiple control instructions, Take the current available resource in the cluster device of source;Based on current available resource and task quantity, each task is distributed in determination Current available resource, to use the current available resource of distribution to run the control instruction of the task.
In accordance with an embodiment of the present disclosure, the method also includes generating the controlling equipment to obtain in the source cluster device Current available resource acquisition record, to inquire described obtain in record with the presence or absence of abnormal acquisition record.
In accordance with an embodiment of the present disclosure, the method also includes verify copy to data volume in target cluster whether with sand The original data volume of box area storage is consistent;And in the case where data volume and original data volume are inconsistent, issue warning information.
Another aspect of the disclosure provides a kind of data processing system, comprising: source cluster device, source cluster device packet Sandbox area and non-sandbox area are included, sandbox Qu Yufei sandbox area is mutually indepedent, is stored with target object in non-sandbox area;Target cluster Equipment;And controlling equipment, controlling equipment is for executing above-mentioned method, wherein source cluster device is used to refer in response to control It enables, the target object stored in non-sandbox area is sampled, to obtain data from the sample survey, data from the sample survey is stored to sandbox area, And data from the sample survey is copied into target cluster device from sandbox area.
Another aspect of the disclosure provides a kind of electronic equipment, comprising: one or more processors;Storage device, For storing one or more programs, wherein when one or more of programs are executed by one or more of processors, So that one or more of processors execute above-mentioned method.
Another aspect of the present disclosure provides a kind of computer readable storage medium, is stored with computer executable instructions, Described instruction is when executed for realizing method as described above.
Another aspect of the present disclosure provides a kind of computer program, and the computer program, which includes that computer is executable, to be referred to It enables, described instruction is when executed for realizing method as described above.
In accordance with an embodiment of the present disclosure, it can at least be partially solved and the data in the cluster of source are copied into target cluster Process flow is long, the consumption plenty of time, and the problem of need a large amount of hardware resource, and therefore may be implemented to reduce source collection Data in group copy to processing step required for target cluster, reduce the technical effect of resource consumption.
Detailed description of the invention
By referring to the drawings to the description of the embodiment of the present disclosure, the above-mentioned and other purposes of the disclosure, feature and Advantage will be apparent from, in the accompanying drawings:
Fig. 1 diagrammatically illustrates a kind of schematic diagram of data processing method;
Fig. 2 diagrammatically illustrates the exemplary system architecture of the data processing method according to the embodiment of the present disclosure;
Fig. 3 diagrammatically illustrates the flow chart of data processing method according to an embodiment of the present disclosure;
Fig. 4 diagrammatically illustrates the method flow diagram according to an embodiment of the present disclosure for generating control instruction;
Fig. 5 diagrammatically illustrates the data processing method according to another embodiment of the disclosure;
Fig. 6 diagrammatically illustrates the operation principle schematic diagram of the controlling equipment 230 according to the embodiment of the present disclosure;
Fig. 7 diagrammatically illustrates the schematic diagram of the data processing equipment according to the embodiment of the present disclosure;And
Fig. 8 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure.
Specific embodiment
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary , and it is not intended to limit the scope of the present disclosure.In the following detailed description, to elaborate many specific thin convenient for explaining Section is to provide the comprehensive understanding to the embodiment of the present disclosure.It may be evident, however, that one or more embodiments are not having these specific thin It can also be carried out in the case where section.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid Unnecessarily obscure the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of Or add other one or more features, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification Meaning, without that should be explained with idealization or excessively mechanical mode.
It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C " Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have B and C, and/or the system with A, B, C etc.).
Fig. 1 diagrammatically illustrates a kind of schematic diagram of data processing method.As shown in Figure 1, Data Preparation Process can wrap The data from the sample survey for including following steps: 1) being sampled the source data stored in the cluster of source, and sampling is obtained is stored to interim Data storage cell;2) desensitization process is carried out to data from the sample survey;3) data after desensitization are exported into source from source cluster device 110 The corresponding general memory cell A of cluster, such as SAN (Storage Area Network, storage area network) or DAS (Direct-attached Storage, direct-connected storage);4) data of general memory cell A are imported into tape;5) pass through magnetic The mode of band is transferred to target cluster;6) data recording on tape is restored to the corresponding general memory cell B of target cluster;7) by data Target cluster is imported into from the corresponding general memory cell B of target cluster.
Accordingly, there exist the process flow of Data Preparation Process is longer, a large amount of hardware resource, Internet resources etc. is needed to ask Topic.
Embodiment of the disclosure provides a kind of data processing method applied to controlling equipment.This method includes obtaining to match Confidence breath generates control instruction and sends the process of control instruction to source cluster device.Source cluster device is in response to the control Instruction, is sampled the source data in target object, the sandbox of the data from the sample survey storage that sampling is obtained to source cluster device Area, and data from the sample survey is copied into target cluster device from sandbox area.Wherein, configuration information includes target object to be sampled Object information and sampling prescription, to generate control instruction according to sampling prescription and the object information.Wherein, target object It is stored in source cluster device, the source cluster device includes mutually independent sandbox area and non-sandbox area, and the target object is deposited It is stored in the non-sandbox area.
Fig. 2 diagrammatically illustrates the exemplary system architecture 200 according to the data processing method of the embodiment of the present disclosure.It needs It is noted that being only the example that can apply the system architecture of the embodiment of the present disclosure shown in Fig. 2, to help those skilled in the art Understand the technology contents of the disclosure, but is not meant to that the embodiment of the present disclosure may not be usable for other equipment, system, environment or field Scape.
As shown in Fig. 2, system architecture 200 may include source cluster 210, target cluster 220 and adjust according to this embodiment Spend equipment 230.
In accordance with an embodiment of the present disclosure, source cluster 210 for example may include multiple node devices, those node devices are common Safeguard one or more database.
In accordance with an embodiment of the present disclosure, creation has sandbox area 211, the fortune in sandbox area 211 in the storage region of source cluster 210 Row environment and the non-sandbox area 212 of source cluster 210 are mutually isolated.
In accordance with an embodiment of the present disclosure, controlling equipment 230 is for generating control instruction, and sends control instruction to source cluster 210, so that source cluster 210 executes the control instruction.Source cluster 210 includes according to the operation that control instruction executes, to source cluster The data in 210 Zhong Fei sandbox areas 212 are sampled, and the data after sampling are stored to sandbox area 211, and by sandbox area 211 In data copy in target cluster 220, so as to utilize data in target cluster 220 to carry out software test, model The work such as training, analysis mining.
Fig. 3 diagrammatically illustrates the flow chart of data processing method according to an embodiment of the present disclosure.The data processing side Method can for example be executed by controlling equipment 230 shown in Fig. 2.
As shown in figure 3, this method may include operation S310~S330.
In operation S310, configuration information is obtained.Wherein, configuration information include target object to be sampled object information and Sampling prescription.Target object is stored in source cluster device.The storage region of source cluster device may include mutually independent sandbox Area and non-sandbox area, target object are stored in non-sandbox area.
In accordance with an embodiment of the present disclosure, configuration information for example can be what user inputted on the terminal device.Configuration information It may include the object information and sampling prescription of target object.
In accordance with an embodiment of the present disclosure, target object for example can tables of data to store in right and wrong sandbox area 212.Target pair The object information of elephant for example can be the table name of tables of data.Tables of data can have various forms, and the embodiment of the present disclosure is not to this It is limited.
Sampling prescription for example can be the data for extracting from tables of data and having character " Beijing ", in another example can be from number It is the data etc. on May 30 according to the extraction date in table.
For example, may include sandbox area 211 in system architecture shown in Fig. 2, in the storage region of source cluster 210 and non- Sandbox area 212.Target object can store in non-sandbox area 212.
In operation S320, it is based on sampling prescription and object information, generates control instruction.
In accordance with an embodiment of the present disclosure, control instruction for example can be structured query language (sql) script.
In accordance with an embodiment of the present disclosure, such as target object can be determined according to object information, and determine target object institute Including metadata, to establish identical with the metadata in target object tables of data, and according to the tables of data and pumping of foundation Control gauge then, generates the script of sql sentence.
In operation S330, control instruction is sent to source cluster device, so that source cluster device is to the source number in target object According to being sampled, the data from the sample survey that sampling is obtained is stored to the sandbox area of source cluster device, and by data from the sample survey from sandbox area Copy to target cluster device.
In accordance with an embodiment of the present disclosure, which establishes sandbox area in the cluster of source, passes through controlling equipment Voltage input cluster is sampled data, and by the data storage after sampling to sandbox area, so as to directly will be in sandbox area Data copy in target cluster.Therefore, on the one hand, the data processing method does not need to carry out multiple data in the cluster of source Access not only saves a large amount of CPU, I/O resource and also saves the time for extracting data from the cluster of source to target cluster.Separately On the one hand, the application establishes sandbox area in the cluster of source, and sandbox Qu Yufei sandbox area is mutually indepedent, is protecting so as to realize In the case where the safety of data in the cluster of card source, the data in sandbox area can not need to be transmitted by disk, but straight It connects and copies in target cluster.
It further includes desensitization configuration that Fig. 4, which diagrammatically illustrates configuration information according to an embodiment of the present disclosure, operates S320's Method flow diagram.
As shown in figure 4, operation S320 may further include operation S321~S324.
In operation S321, it is based on object information, determines the metadata of target object.For example, it may be according to tables of data Table name determines the source data table stored in the cluster of source, so that it is determined that the metadata of the source data table.
In operation S322, it is based on metadata, establishes tables of data.Such as it can be the member of controlling equipment foundation and source data table At least partly identical tables of data of data.
It in operation S323, is configured according to desensitization and determines desensitization function, desensitization function is used to carry out data to data from the sample survey de- It is quick.
In accordance with an embodiment of the present disclosure, such as unique desensitization letter can be determined according to the desensitization configuration that user inputs Number.Desensitization configuration for example can be the identification information of the desensitization function of user's input.It will be understood by those skilled in the art that " de- Quick function " refers to the function for desensitizing to data.
Control instruction is generated according to sampling prescription, tables of data and desensitization function in operation S324.Such as it can be in life At sql sentence composition tables of data in desensitization function is added, and according to sampling prescription, generate the script of sql sentence.
In accordance with an embodiment of the present disclosure, which includes desensitization function in the control instruction of generation, so that It extracts data and data from the sample survey is desensitized and can be completed with a step, further save the hardware resources such as CPU, I/O.
In accordance with an embodiment of the present disclosure, controlling equipment is to source collection pocket transmission control instruction generated, so that source cluster is held The row control instruction.
Source cluster includes: according to sampling prescription and object information, from target object according to the operation that the control instruction executes Middle acquisition data from the sample survey;Data desensitization is carried out to data from the sample survey, obtains desensitization data;Desensitization data are saved in sandbox area;With And desensitization data are copied into target cluster from sandbox area.
Fig. 5 diagrammatically illustrates the data processing method according to another embodiment of the disclosure.
As shown in figure 5, can also include behaviour on the basis of data processing method operation S310~S330 shown in Fig. 3 Make S510~S530.Such as this method can execute after operation 520.
In operation S510, in the case where there is multiple control instructions for executing different task respectively, obtain concurrent Configuration parameter.
In operation S520, it is based on concurrent configuration parameter, determines the number of tasks for the task that the source cluster device is performed simultaneously Amount.
In accordance with an embodiment of the present disclosure, for example, can be in configuration information include multiple target objects object information, root According to sampling prescription, desensitization configuration and the metadata for each target object, a control instruction is generated, so that each is controlled Instruction is respectively used to execute different tasks.
In accordance with an embodiment of the present disclosure, in operation S510 and operation S520, concurrent configuration parameter is can be set in user, with control The task quantity that source cluster processed is performed simultaneously.Such as concurrent configuration parameter can be 3, then source cluster is performed simultaneously 3 tasks pair The control instruction answered.
In operation S530, task based access control quantity, voltage input cluster device executes multiple control instructions.
In accordance with an embodiment of the present disclosure, this method can be enabled a user to by the way that concurrent configuration parameter is arranged to source cluster The concurrent quantity of 210 execution control instructions is managed.
In accordance with an embodiment of the present disclosure, operation S530 may further include: obtain currently available in the cluster device of source Resource;And it is based on current available resource and the quantity, it determines and distributes to the current available resource of each task, to use point The current available resource matched runs the control instruction of the task.
Current available resource, such as may include currently available cpu resource, memory source etc..
In accordance with an embodiment of the present disclosure, such as the number of task that can be executed according to currently available cpu resource and concurrently Amount determines the cpu resource for distributing to each task.Specifically, for example, can be in the cluster of source currently can with CPU have 100, The quantity of concurrently executing for task is 100, then can distribute a cpu resource for each task.
In accordance with an embodiment of the present disclosure, this method can make during multiple tasks are performed simultaneously, to currently may be used With resource reasonable distribution.
In accordance with an embodiment of the present disclosure, data processing method can also include: and generate controlling equipment to obtain source cluster device In current available resource acquisition record, obtain in record to inquire with the presence or absence of abnormal acquisition record.
Such as it can be access originator, access time, the access object of the current available resource of record access source cluster.
Such as the IP address of usual access originator should be the IP address of controlling equipment, when inquiry obtains record discovery, there are it In the case that his IP address is access originator, it is determined as abnormal acquisition record.Access object for example can be the source number of access According to the table name of table.
In accordance with an embodiment of the present disclosure, data processing method can also include: that verification copies in the target cluster Whether data volume is consistent with the original data volume of sandbox area storage;And the data volume and the original data volume not Under unanimous circumstances, warning information is issued.
Such as can be data volume in controlling equipment access target cluster, to compare whether data volume stores with sandbox area Original data volume it is consistent.
In accordance with an embodiment of the present disclosure, data processing method can also include that the process of this data preparation is generated shelves Case log, so that the later period checks.It for example can recorde time of this data preparation, source data, target cluster etc. in log.
Fig. 6 diagrammatically illustrates the operation principle schematic diagram of the controlling equipment 230 according to the embodiment of the present disclosure.
As shown in fig. 6, the available input information of controlling equipment 230, such as the behaviour above with reference to Fig. 3 description can be executed Make S310.Input information for example may include target object inventory, sampling prescription and desensitization configuration.Target object inventory is for example It can be the table name for being extracted the source data table of data.It may include multiple table names in target object inventory, with from multiple data Data are extracted in table.
In accordance with an embodiment of the present disclosure, controlling equipment 230 establishes tables of data according to the metadata of source data table, with source Metadata in tables of data is identical, and is configured according to desensitization and determine desensitization function.It can be according to pumping to dispatch executive program Control gauge is then, desensitize function and metadata generation control instruction.Such as the operation S320 described above with reference to Fig. 3 can be executed.
As shown in fig. 6, controlling equipment 230 can also obtain concurrent configuration parameter, and it is based on concurrent configuration parameter, determined Source cluster device 210 executes in batches the quantity of task.Such as the operation S510~S530 described above with reference to Fig. 5 can be executed.
In accordance with an embodiment of the present disclosure, such as it can be scheduling executive program according to sampling prescription, desensitization function, first number Control instruction is generated according to concurrently configuration, to execute in batches the quantity of task by control instruction voltage input cluster device 210.
As shown in fig. 6, controlling equipment 230 can also include the current available resource in acquisition source cluster device 210, and Based on current available resource and task quantity, the current available resource for distributing to each task is determined, to use the current of distribution Available resources run the control instruction of the task.In accordance with an embodiment of the present disclosure, what controlling equipment 230 can for example will acquire works as Preceding available resources are sent to scheduling executive program, by scheduling executive program according to currently being each appoint with resource and task quantity Business distribution current available resource, to run multiple task by control instruction voltage input cluster 210.
In accordance with an embodiment of the present disclosure, it after perfect scheduling executive program being generated in controlling equipment 230, can will dispatch Executive program is sent to source cluster device 210.Source cluster device 210 stores non-sandbox area 212 according to the scheduling executive program Target object source data be sampled, operation of desensitizing, and by after desensitization data store into sandbox area 211, thus Data after the desensitization stored in sandbox area 211 are copied into target cluster device 220.
As shown in fig. 6, controlling equipment 230 can also verify the target data that copies in target cluster 220 whether with source Initial data in data is consistent;And in the case where target data and initial data are inconsistent, issue warning information.
In accordance with an embodiment of the present disclosure, controlling equipment 230 for example can be access target cluster 220, be copied to obtaining Data volume in target cluster 220, thus compare copy to the data volume in target cluster whether with the original number in source data It is consistent according to amount.
As shown in fig. 6, controlling equipment 230 can also generate controlling equipment 230 obtain source cluster device 210 in it is current can It is recorded with the acquisition of resource, to inquire described obtain in record with the presence or absence of abnormal acquisition record.Such as it can be record Access originator, access time, the access object of the current available resource of access originator cluster.Such as the IP address of usual access originator should It is the IP address of controlling equipment, in the case that inquiry acquisition record discovery is access originator in the presence of other IP address, is determined as Abnormal acquisition record.
As shown in fig. 6, controlling equipment 230 can also generate the archives log of this data preparation, so that the later period checks.Day It for example can recorde time of this data preparation, source data, target cluster etc. in will.
Another aspect of the present disclosure discloses a kind of data processing system.
The data processing system may include source cluster device, controlling equipment and target cluster device.
Controlling equipment is for obtaining configuration information, wherein configuration information includes the object information of target object to be sampled And sampling prescription, target object is stored in source cluster device and controlling equipment is used for based on sampling prescription and object information, raw Control instruction is sent at control instruction, and to source cluster device.Controlling equipment for example can be controlling equipment shown in fig. 6 230。
Source cluster device includes sandbox area and non-sandbox area, and sandbox Qu Yufei sandbox area is mutually indepedent, non-sandbox Qu Zhongcun Contain target object.Controlling equipment for example can be controlling equipment 210 shown in fig. 6.
Source cluster device is used to be sampled the target object stored in non-sandbox area, in response to control instruction to obtain Data from the sample survey is obtained, data from the sample survey is stored and copies to the target cluster device from sandbox area to sandbox area, and by data from the sample survey. Target cluster device for example can be controlling equipment 220 shown in fig. 6.
In accordance with an embodiment of the present disclosure, controlling equipment can execute data processing method described in any one above.
Another aspect of the present disclosure discloses a kind of data processing equipment.
Fig. 7 diagrammatically illustrates the schematic diagram of the data processing equipment 700 according to the embodiment of the present disclosure.
As shown in fig. 7, data processing equipment 700 includes obtaining module 710, generation module 720 and sending module 730.
Module 710 is obtained, such as the operation S310 above with reference to Fig. 3 description can be executed, for obtaining configuration information, In, the configuration information includes the object information and sampling prescription of target object to be sampled, and the target object is stored in source Cluster device, the source cluster device include mutually independent sandbox area and non-sandbox area, and the target object is stored in described Non- sandbox area.
Generation module 720, such as the operation S320 above with reference to Fig. 3 description can be executed, for being advised based on the sampling Then with the object information, control instruction is generated.
Sending module 730, such as the operation S330 above with reference to Fig. 3 description can be executed, for being set to the source cluster Preparation send the control instruction, so that the source cluster device is sampled the source data in the target object, will sample The data from the sample survey storage of acquisition is copied to the sandbox area of the source cluster device, and by the data from the sample survey from the sandbox area Target cluster device.
It is module according to an embodiment of the present disclosure, submodule, unit, any number of or in which any more in subelement A at least partly function can be realized in a module.It is single according to the module of the embodiment of the present disclosure, submodule, unit, son Any one or more in member can be split into multiple modules to realize.According to the module of the embodiment of the present disclosure, submodule, Any one or more in unit, subelement can at least be implemented partly as hardware circuit, such as field programmable gate Array (FPGA), programmable logic array (PLA), system on chip, the system on substrate, the system in encapsulation, dedicated integrated electricity Road (ASIC), or can be by the hardware or firmware for any other rational method for integrate or encapsulate to circuit come real Show, or with any one in three kinds of software, hardware and firmware implementations or with wherein any several appropriately combined next reality It is existing.Alternatively, can be at least by part according to one or more of the module of the embodiment of the present disclosure, submodule, unit, subelement Ground is embodied as computer program module, when the computer program module is run, can execute corresponding function.
For example, obtaining module 710, any number of in generation module 720 and sending module 730 may be incorporated in one It is realized in module or any one module therein can be split into multiple modules.Alternatively, one in these modules or At least partly function of multiple modules can be combined at least partly function of other modules, and be realized in a module. In accordance with an embodiment of the present disclosure, obtaining at least one of module 710, generation module 720 and sending module 730 can be at least It is implemented partly as hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), on piece system System, the system on substrate, the system in encapsulation, specific integrated circuit (ASIC), or can be by being integrated or being sealed to circuit The hardware such as any other rational method or firmware of dress realize, or in three kinds of software, hardware and firmware implementations Any one several appropriately combined is realized with wherein any.Alternatively, obtaining module 710, generation module 720 and sending At least one of module 730 can at least be implemented partly as computer program module, when the computer program module quilt When operation, corresponding function can be executed.
Fig. 8 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure.Electronic equipment shown in Fig. 8 is only Only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 8, include processor 801 according to the electronic equipment 800 of the embodiment of the present disclosure, it can be according to being stored in Program in read-only memory (ROM) 802 is loaded into the journey in random access storage device (RAM) 803 from storage section 808 Sequence and execute various movements appropriate and processing.Processor 801 for example may include general purpose microprocessor (such as CPU), instruction Set processor and/or related chip group and/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor 801 can also include the onboard storage device for caching purposes.Processor 801 may include being implemented for executing according to the disclosure Single treatment unit either multiple processing units of the different movements of the method flow of example.
In RAM 803, it is stored with system 800 and operates required various programs and data.Processor 801, ROM 802 with And RAM 803 is connected with each other by bus 804.Processor 801 is held by executing the program in ROM 802 and/or RAM 803 The various operations gone according to the method flow of the embodiment of the present disclosure.It is noted that described program also can store except ROM 802 In one or more memories other than RAM 803.Processor 801 can also be stored in one or more of by execution Program in memory executes the various operations of the method flow according to the embodiment of the present disclosure.
In accordance with an embodiment of the present disclosure, system 800 can also include input/output (I/O) interface 805, input/output (I/O) interface 805 is also connected to bus 804.System 800 can also include be connected to I/O interface 805 with one in lower component Item is multinomial: the importation 806 including keyboard, mouse etc.;Including such as cathode-ray tube (CRT), liquid crystal display (LCD) Deng and loudspeaker etc. output par, c 807;Storage section 808 including hard disk etc.;And including such as LAN card, modulatedemodulate Adjust the communications portion 809 of the network interface card of device etc..Communications portion 809 executes communication process via the network of such as internet. Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as disk, CD, magneto-optic disk, semiconductor Memory etc. is mounted on as needed on driver 810, in order to be pacified as needed from the computer program read thereon It is packed into storage section 808.
In accordance with an embodiment of the present disclosure, computer software journey may be implemented as according to the method flow of the embodiment of the present disclosure Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer readable storage medium Computer program, which includes the program code for method shown in execution flow chart.In such implementation In example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media 811 It is mounted.When the computer program is executed by processor 801, the above-mentioned function limited in the system of the embodiment of the present disclosure is executed Energy.In accordance with an embodiment of the present disclosure, system as described above, unit, module, unit etc. can pass through computer program Module is realized.
The disclosure additionally provides a kind of computer readable storage medium, which can be above-mentioned reality It applies included in equipment/device/system described in example;Be also possible to individualism, and without be incorporated the equipment/device/ In system.Above-mentioned computer readable storage medium carries one or more program, when said one or multiple program quilts When execution, the method according to the embodiment of the present disclosure is realized.
In accordance with an embodiment of the present disclosure, computer readable storage medium can be non-volatile computer-readable storage medium Matter, such as can include but is not limited to: portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.For example, in accordance with an embodiment of the present disclosure, computer readable storage medium may include above-described One or more memories other than ROM 802 and/or RAM 803 and/or ROM 802 and RAM 803.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
It will be understood by those skilled in the art that the feature recorded in each embodiment and/or claim of the disclosure can To carry out multiple combinations and/or combination, even if such combination or combination are not expressly recited in the disclosure.Particularly, In In the case where not departing from disclosure spirit or teaching, the feature recorded in each embodiment and/or claim of the disclosure can To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.Above to the disclosure Embodiment be described.But the purpose that these embodiments are merely to illustrate that, and it is not intended to the limitation disclosure Range.Although respectively describing each embodiment above, but it is not intended that the measure in each embodiment cannot be advantageous Ground is used in combination.The scope of the present disclosure is defined by the appended claims and the equivalents thereof.The scope of the present disclosure, this field are not departed from Technical staff can make a variety of alternatives and modifications, these alternatives and modifications should all be fallen within the scope of the disclosure.

Claims (10)

1. a kind of data processing method is applied to controlling equipment, which comprises
Obtain configuration information, wherein the configuration information includes the object information and sampling prescription of target object to be sampled, institute It states target object and is stored in source cluster device, the source cluster device includes mutually independent sandbox area and non-sandbox area, described Target object is stored in the non-sandbox area;
Based on the sampling prescription and the object information, control instruction is generated;And
The control instruction is sent to the source cluster device, so that the source cluster device is to the source number in the target object According to being sampled, the data from the sample survey that sampling is obtained is stored to the sandbox area of the source cluster device, and by the data from the sample survey Target cluster device is copied to from the sandbox area.
2. according to the method described in claim 1, wherein, the configuration information further includes desensitization configuration;It is described to be based on the pumping Then with the object information, generate control instruction includes: control gauge
Based on the object information, the metadata of the target object is determined;
Based on the metadata, tables of data is established;
It is configured according to the desensitization and determines desensitization function, the desensitization function is used to carry out data desensitization to the data from the sample survey; And
According to the sampling prescription, the tables of data and the desensitization function, control instruction is generated.
3. according to the method described in claim 2, wherein, generating the control instruction to execute following operation:
According to the sampling prescription and the object information, data from the sample survey is obtained from the target object;
Data desensitization is carried out to the data from the sample survey, obtains desensitization data;
The desensitization data are saved in the sandbox area;And
The desensitization data are copied into the target cluster device from the sandbox area.
4. according to the method described in claim 1, further include:
In the case where there is multiple control instructions for executing different task respectively, concurrent configuration parameter is obtained;
Based on the concurrent configuration parameter, the task quantity for the task that the source cluster device is performed simultaneously is determined;And
Based on the task quantity, controls the source cluster device and execute multiple control instructions.
5. it is described to be based on the task quantity according to the method described in claim 4, wherein, it controls the source cluster device and holds The multiple control instructions of row include:
Obtain the current available resource in the source cluster device;
Based on the current available resource and the task quantity, the current available resource for distributing to each task is determined, To use the current available resource of distribution to run the control instruction of the task.
6. according to the method described in claim 5, further include:
The acquisition record for the current available resource that the controlling equipment obtains in the source cluster device is generated, it is described to inquire It obtains in record with the presence or absence of abnormal acquisition record.
7. according to the method described in claim 1, further include:
Whether verification copies to the data volume in the target cluster consistent with the original data volume of sandbox area storage;And
In the case where the data volume and the original data volume are inconsistent, warning information is issued.
8. a kind of data processing system, comprising:
Source cluster device, the source cluster device include sandbox area and non-sandbox area, the sandbox area and the non-sandbox area phase It is mutually independent, target object is stored in the non-sandbox area;
Target cluster device;And
Controlling equipment, the controlling equipment are used to execute the method as described in claim 1~7 any one,
Wherein, the source cluster device is used in response to the control instruction, to the target object stored in the non-sandbox area It is sampled, to obtain data from the sample survey, the data from the sample survey is stored to the sandbox area, and by the data from the sample survey from sandbox Area copies to the target cluster device.
9. a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs,
Wherein, when one or more of programs are executed by one or more of processors, so that one or more of Processor executes the method as described in claim 1~7 any one.
10. a kind of computer readable storage medium, is stored thereon with executable instruction, which makes to handle when being executed by processor Device executes the method as described in claim 1~7 any one.
CN201910688165.5A 2019-07-26 2019-07-26 Data processing method, system, electronic device and storage medium Active CN110399209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910688165.5A CN110399209B (en) 2019-07-26 2019-07-26 Data processing method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910688165.5A CN110399209B (en) 2019-07-26 2019-07-26 Data processing method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN110399209A true CN110399209A (en) 2019-11-01
CN110399209B CN110399209B (en) 2022-02-25

Family

ID=68326347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910688165.5A Active CN110399209B (en) 2019-07-26 2019-07-26 Data processing method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110399209B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800473A (en) * 2021-03-17 2021-05-14 好人生(上海)健康科技有限公司 Data processing method based on big data safety house
CN112988604A (en) * 2021-04-30 2021-06-18 中国工商银行股份有限公司 Object testing method, testing system, electronic device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868389A (en) * 2016-04-15 2016-08-17 北京思特奇信息技术股份有限公司 Method and system for implementing data sandbox based on mongoDB
CN106650424A (en) * 2016-11-28 2017-05-10 北京奇虎科技有限公司 Method and device for detecting target sample file
CN106776143A (en) * 2016-12-27 2017-05-31 北京奇虎科技有限公司 The method and terminal device of a kind of mirror back-up for end application
CN107247741A (en) * 2017-05-14 2017-10-13 四川盛世天成信息技术有限公司 A kind of concentrating type textual magnanimity sensitive data processing method and system
CN109491989A (en) * 2018-11-12 2019-03-19 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109635024A (en) * 2018-11-23 2019-04-16 华迪计算机集团有限公司 A kind of data migration method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868389A (en) * 2016-04-15 2016-08-17 北京思特奇信息技术股份有限公司 Method and system for implementing data sandbox based on mongoDB
CN106650424A (en) * 2016-11-28 2017-05-10 北京奇虎科技有限公司 Method and device for detecting target sample file
CN106776143A (en) * 2016-12-27 2017-05-31 北京奇虎科技有限公司 The method and terminal device of a kind of mirror back-up for end application
CN107247741A (en) * 2017-05-14 2017-10-13 四川盛世天成信息技术有限公司 A kind of concentrating type textual magnanimity sensitive data processing method and system
CN109491989A (en) * 2018-11-12 2019-03-19 北京懿医云科技有限公司 Data processing method and device, electronic equipment, storage medium
CN109635024A (en) * 2018-11-23 2019-04-16 华迪计算机集团有限公司 A kind of data migration method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KONRAD JAMROZIK: "Mining Sandboxes", 《2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING》 *
谢欣: "基于SequoiaDB的金融业历史数据存储与查询解决方案", 《金融电子化》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800473A (en) * 2021-03-17 2021-05-14 好人生(上海)健康科技有限公司 Data processing method based on big data safety house
CN112988604A (en) * 2021-04-30 2021-06-18 中国工商银行股份有限公司 Object testing method, testing system, electronic device and readable storage medium
CN112988604B (en) * 2021-04-30 2024-04-02 中国工商银行股份有限公司 Object testing method, testing system, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN110399209B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN106027330B (en) A kind of front end system message test method and simulation baffle system
US8990157B2 (en) Replication support for structured data
US10291704B2 (en) Networked solutions integration using a cloud business object broker
CN109254992A (en) Project generation method and system, computer system and computer readable storage medium storing program for executing
JP2019533854A (en) Graph generation for distributed event processing systems.
US20150213066A1 (en) System and method for creating data models from complex raw log files
JP5923691B2 (en) Logistics cloud system and program
CN107506190A (en) XML file amending method and device based on Spring frameworks
Silva et al. Integrating big data into the computing curricula
CN110399209A (en) Data processing method, system, electronic equipment and storage medium
CN111310232A (en) Data desensitization method and device, electronic equipment and storage medium
EP2904520B1 (en) Reference data segmentation from single to multiple tables
US11782888B2 (en) Dynamic multi-platform model generation and deployment system
EP2835774A1 (en) A method and device for executing an enterprise process
US10110610B2 (en) Dynamic permission assessment and reporting engines
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN108268512A (en) A kind of tag queries method and device
WO2019111188A1 (en) Job management in data processing system
US11700241B2 (en) Isolated data processing modules
US20120317073A1 (en) Replication Support for Procedures with Arguments of Unsupported Types
CN115328997B (en) Data synchronization method, system, device and storage medium
US20180165337A1 (en) System for Extracting Data from a Database in a User Selected Format and Related Methods and Computer Program Products
US20140280365A1 (en) Method and system for data system management using cloud-based data migration
CN114978944A (en) Pressure testing method, device and computer program product
CN111625465A (en) Program generation method, device and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant