CN110399209A - Data processing method, system, electronic equipment and storage medium - Google Patents
Data processing method, system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110399209A CN110399209A CN201910688165.5A CN201910688165A CN110399209A CN 110399209 A CN110399209 A CN 110399209A CN 201910688165 A CN201910688165 A CN 201910688165A CN 110399209 A CN110399209 A CN 110399209A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster device
- sandbox area
- source
- source cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/53—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Present disclose provides a kind of data processing methods, applied to controlling equipment, the described method includes: obtaining configuration information, wherein, the configuration information includes the object information and sampling prescription of target object to be sampled, the target object is stored in source cluster device, and the source cluster device includes mutually independent sandbox area and non-sandbox area, and the target object is stored in the non-sandbox area;Based on the sampling prescription and the object information, control instruction is generated;And control instruction is sent to source cluster device, so that the source cluster device is sampled the source data in the target object, the data from the sample survey that sampling obtains is stored and copies to target cluster device from the sandbox area to the sandbox area of the source cluster device, and by the data from the sample survey.The disclosure additionally provides a kind of data processing system, electronic equipment and computer readable storage medium.
Description
Technical field
This disclosure relates to field of computer technology, more particularly, to a kind of data processing method, system, electronic equipment
And storage medium.
Background technique
The data preparation of big data platform is usually that the data in the cluster of source are copied to target cluster.In the prior art,
The operating process that data in the cluster of source are copied to target cluster is more complicated, and process flow is longer, and needs a large amount of
Hardware resource, Internet resources etc..
In the prior art, there are the process flow of Data Preparation Process is longer, the required time is long, and needs a large amount of
Hardware resource, Internet resources the problems such as.
Summary of the invention
In view of this, present disclose provides a kind of data processing method, system, electronic equipment and storage mediums.
An aspect of this disclosure provides a kind of data processing method, is applied to controlling equipment, method includes: that acquisition is matched
Confidence breath, wherein configuration information includes the object information and sampling prescription of target object to be sampled, and target object is stored in source
Cluster device, source cluster device include mutually independent sandbox area and non-sandbox area, and target object is stored in non-sandbox area;It is based on
Sampling prescription and object information generate control instruction;And control instruction is sent to source cluster device, so that source cluster device pair
Source data in target object is sampled, and the data from the sample survey that sampling is obtained is stored to the sandbox area of source cluster device, and will
Data from the sample survey copies to target cluster device from sandbox area.
In accordance with an embodiment of the present disclosure, configuration information further includes desensitization configuration;Based on sampling prescription and object information, generate
Control instruction includes: to determine the metadata of target object based on object information;Based on metadata, tables of data is established;According to described
Desensitization configuration determines that desensitization function, desensitization function are used to carry out data desensitization to data from the sample survey;And according to sampling prescription, data
Table and desensitization function, generate control instruction.
In accordance with an embodiment of the present disclosure, control instruction is generated to execute following operation: being believed according to sampling prescription and object
Breath, obtains data from the sample survey from target object;Data desensitization is carried out to data from the sample survey, obtains desensitization data;Desensitization data are protected
It is stored to sandbox area;And desensitization data are copied into target cluster device from sandbox area.
In accordance with an embodiment of the present disclosure, the method also includes there are multiple controls for executing different task respectively
In the case where instruction, concurrent configuration parameter is obtained;Based on the concurrent configuration parameter, determine that the source cluster device is performed simultaneously
Task task quantity;And it is based on the task quantity, it controls the source cluster device and executes multiple control instructions.
In accordance with an embodiment of the present disclosure, task based access control quantity, it includes: to obtain that voltage input cluster device, which executes multiple control instructions,
Take the current available resource in the cluster device of source;Based on current available resource and task quantity, each task is distributed in determination
Current available resource, to use the current available resource of distribution to run the control instruction of the task.
In accordance with an embodiment of the present disclosure, the method also includes generating the controlling equipment to obtain in the source cluster device
Current available resource acquisition record, to inquire described obtain in record with the presence or absence of abnormal acquisition record.
In accordance with an embodiment of the present disclosure, the method also includes verify copy to data volume in target cluster whether with sand
The original data volume of box area storage is consistent;And in the case where data volume and original data volume are inconsistent, issue warning information.
Another aspect of the disclosure provides a kind of data processing system, comprising: source cluster device, source cluster device packet
Sandbox area and non-sandbox area are included, sandbox Qu Yufei sandbox area is mutually indepedent, is stored with target object in non-sandbox area;Target cluster
Equipment;And controlling equipment, controlling equipment is for executing above-mentioned method, wherein source cluster device is used to refer in response to control
It enables, the target object stored in non-sandbox area is sampled, to obtain data from the sample survey, data from the sample survey is stored to sandbox area,
And data from the sample survey is copied into target cluster device from sandbox area.
Another aspect of the disclosure provides a kind of electronic equipment, comprising: one or more processors;Storage device,
For storing one or more programs, wherein when one or more of programs are executed by one or more of processors,
So that one or more of processors execute above-mentioned method.
Another aspect of the present disclosure provides a kind of computer readable storage medium, is stored with computer executable instructions,
Described instruction is when executed for realizing method as described above.
Another aspect of the present disclosure provides a kind of computer program, and the computer program, which includes that computer is executable, to be referred to
It enables, described instruction is when executed for realizing method as described above.
In accordance with an embodiment of the present disclosure, it can at least be partially solved and the data in the cluster of source are copied into target cluster
Process flow is long, the consumption plenty of time, and the problem of need a large amount of hardware resource, and therefore may be implemented to reduce source collection
Data in group copy to processing step required for target cluster, reduce the technical effect of resource consumption.
Detailed description of the invention
By referring to the drawings to the description of the embodiment of the present disclosure, the above-mentioned and other purposes of the disclosure, feature and
Advantage will be apparent from, in the accompanying drawings:
Fig. 1 diagrammatically illustrates a kind of schematic diagram of data processing method;
Fig. 2 diagrammatically illustrates the exemplary system architecture of the data processing method according to the embodiment of the present disclosure;
Fig. 3 diagrammatically illustrates the flow chart of data processing method according to an embodiment of the present disclosure;
Fig. 4 diagrammatically illustrates the method flow diagram according to an embodiment of the present disclosure for generating control instruction;
Fig. 5 diagrammatically illustrates the data processing method according to another embodiment of the disclosure;
Fig. 6 diagrammatically illustrates the operation principle schematic diagram of the controlling equipment 230 according to the embodiment of the present disclosure;
Fig. 7 diagrammatically illustrates the schematic diagram of the data processing equipment according to the embodiment of the present disclosure;And
Fig. 8 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure.
Specific embodiment
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary
, and it is not intended to limit the scope of the present disclosure.In the following detailed description, to elaborate many specific thin convenient for explaining
Section is to provide the comprehensive understanding to the embodiment of the present disclosure.It may be evident, however, that one or more embodiments are not having these specific thin
It can also be carried out in the case where section.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid
Unnecessarily obscure the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein
The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of
Or add other one or more features, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood
Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification
Meaning, without that should be explained with idealization or excessively mechanical mode.
It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to
Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C "
Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or
System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come
Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least
One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have
B and C, and/or the system with A, B, C etc.).
Fig. 1 diagrammatically illustrates a kind of schematic diagram of data processing method.As shown in Figure 1, Data Preparation Process can wrap
The data from the sample survey for including following steps: 1) being sampled the source data stored in the cluster of source, and sampling is obtained is stored to interim
Data storage cell;2) desensitization process is carried out to data from the sample survey;3) data after desensitization are exported into source from source cluster device 110
The corresponding general memory cell A of cluster, such as SAN (Storage Area Network, storage area network) or DAS
(Direct-attached Storage, direct-connected storage);4) data of general memory cell A are imported into tape;5) pass through magnetic
The mode of band is transferred to target cluster;6) data recording on tape is restored to the corresponding general memory cell B of target cluster;7) by data
Target cluster is imported into from the corresponding general memory cell B of target cluster.
Accordingly, there exist the process flow of Data Preparation Process is longer, a large amount of hardware resource, Internet resources etc. is needed to ask
Topic.
Embodiment of the disclosure provides a kind of data processing method applied to controlling equipment.This method includes obtaining to match
Confidence breath generates control instruction and sends the process of control instruction to source cluster device.Source cluster device is in response to the control
Instruction, is sampled the source data in target object, the sandbox of the data from the sample survey storage that sampling is obtained to source cluster device
Area, and data from the sample survey is copied into target cluster device from sandbox area.Wherein, configuration information includes target object to be sampled
Object information and sampling prescription, to generate control instruction according to sampling prescription and the object information.Wherein, target object
It is stored in source cluster device, the source cluster device includes mutually independent sandbox area and non-sandbox area, and the target object is deposited
It is stored in the non-sandbox area.
Fig. 2 diagrammatically illustrates the exemplary system architecture 200 according to the data processing method of the embodiment of the present disclosure.It needs
It is noted that being only the example that can apply the system architecture of the embodiment of the present disclosure shown in Fig. 2, to help those skilled in the art
Understand the technology contents of the disclosure, but is not meant to that the embodiment of the present disclosure may not be usable for other equipment, system, environment or field
Scape.
As shown in Fig. 2, system architecture 200 may include source cluster 210, target cluster 220 and adjust according to this embodiment
Spend equipment 230.
In accordance with an embodiment of the present disclosure, source cluster 210 for example may include multiple node devices, those node devices are common
Safeguard one or more database.
In accordance with an embodiment of the present disclosure, creation has sandbox area 211, the fortune in sandbox area 211 in the storage region of source cluster 210
Row environment and the non-sandbox area 212 of source cluster 210 are mutually isolated.
In accordance with an embodiment of the present disclosure, controlling equipment 230 is for generating control instruction, and sends control instruction to source cluster
210, so that source cluster 210 executes the control instruction.Source cluster 210 includes according to the operation that control instruction executes, to source cluster
The data in 210 Zhong Fei sandbox areas 212 are sampled, and the data after sampling are stored to sandbox area 211, and by sandbox area 211
In data copy in target cluster 220, so as to utilize data in target cluster 220 to carry out software test, model
The work such as training, analysis mining.
Fig. 3 diagrammatically illustrates the flow chart of data processing method according to an embodiment of the present disclosure.The data processing side
Method can for example be executed by controlling equipment 230 shown in Fig. 2.
As shown in figure 3, this method may include operation S310~S330.
In operation S310, configuration information is obtained.Wherein, configuration information include target object to be sampled object information and
Sampling prescription.Target object is stored in source cluster device.The storage region of source cluster device may include mutually independent sandbox
Area and non-sandbox area, target object are stored in non-sandbox area.
In accordance with an embodiment of the present disclosure, configuration information for example can be what user inputted on the terminal device.Configuration information
It may include the object information and sampling prescription of target object.
In accordance with an embodiment of the present disclosure, target object for example can tables of data to store in right and wrong sandbox area 212.Target pair
The object information of elephant for example can be the table name of tables of data.Tables of data can have various forms, and the embodiment of the present disclosure is not to this
It is limited.
Sampling prescription for example can be the data for extracting from tables of data and having character " Beijing ", in another example can be from number
It is the data etc. on May 30 according to the extraction date in table.
For example, may include sandbox area 211 in system architecture shown in Fig. 2, in the storage region of source cluster 210 and non-
Sandbox area 212.Target object can store in non-sandbox area 212.
In operation S320, it is based on sampling prescription and object information, generates control instruction.
In accordance with an embodiment of the present disclosure, control instruction for example can be structured query language (sql) script.
In accordance with an embodiment of the present disclosure, such as target object can be determined according to object information, and determine target object institute
Including metadata, to establish identical with the metadata in target object tables of data, and according to the tables of data and pumping of foundation
Control gauge then, generates the script of sql sentence.
In operation S330, control instruction is sent to source cluster device, so that source cluster device is to the source number in target object
According to being sampled, the data from the sample survey that sampling is obtained is stored to the sandbox area of source cluster device, and by data from the sample survey from sandbox area
Copy to target cluster device.
In accordance with an embodiment of the present disclosure, which establishes sandbox area in the cluster of source, passes through controlling equipment
Voltage input cluster is sampled data, and by the data storage after sampling to sandbox area, so as to directly will be in sandbox area
Data copy in target cluster.Therefore, on the one hand, the data processing method does not need to carry out multiple data in the cluster of source
Access not only saves a large amount of CPU, I/O resource and also saves the time for extracting data from the cluster of source to target cluster.Separately
On the one hand, the application establishes sandbox area in the cluster of source, and sandbox Qu Yufei sandbox area is mutually indepedent, is protecting so as to realize
In the case where the safety of data in the cluster of card source, the data in sandbox area can not need to be transmitted by disk, but straight
It connects and copies in target cluster.
It further includes desensitization configuration that Fig. 4, which diagrammatically illustrates configuration information according to an embodiment of the present disclosure, operates S320's
Method flow diagram.
As shown in figure 4, operation S320 may further include operation S321~S324.
In operation S321, it is based on object information, determines the metadata of target object.For example, it may be according to tables of data
Table name determines the source data table stored in the cluster of source, so that it is determined that the metadata of the source data table.
In operation S322, it is based on metadata, establishes tables of data.Such as it can be the member of controlling equipment foundation and source data table
At least partly identical tables of data of data.
It in operation S323, is configured according to desensitization and determines desensitization function, desensitization function is used to carry out data to data from the sample survey de-
It is quick.
In accordance with an embodiment of the present disclosure, such as unique desensitization letter can be determined according to the desensitization configuration that user inputs
Number.Desensitization configuration for example can be the identification information of the desensitization function of user's input.It will be understood by those skilled in the art that " de-
Quick function " refers to the function for desensitizing to data.
Control instruction is generated according to sampling prescription, tables of data and desensitization function in operation S324.Such as it can be in life
At sql sentence composition tables of data in desensitization function is added, and according to sampling prescription, generate the script of sql sentence.
In accordance with an embodiment of the present disclosure, which includes desensitization function in the control instruction of generation, so that
It extracts data and data from the sample survey is desensitized and can be completed with a step, further save the hardware resources such as CPU, I/O.
In accordance with an embodiment of the present disclosure, controlling equipment is to source collection pocket transmission control instruction generated, so that source cluster is held
The row control instruction.
Source cluster includes: according to sampling prescription and object information, from target object according to the operation that the control instruction executes
Middle acquisition data from the sample survey;Data desensitization is carried out to data from the sample survey, obtains desensitization data;Desensitization data are saved in sandbox area;With
And desensitization data are copied into target cluster from sandbox area.
Fig. 5 diagrammatically illustrates the data processing method according to another embodiment of the disclosure.
As shown in figure 5, can also include behaviour on the basis of data processing method operation S310~S330 shown in Fig. 3
Make S510~S530.Such as this method can execute after operation 520.
In operation S510, in the case where there is multiple control instructions for executing different task respectively, obtain concurrent
Configuration parameter.
In operation S520, it is based on concurrent configuration parameter, determines the number of tasks for the task that the source cluster device is performed simultaneously
Amount.
In accordance with an embodiment of the present disclosure, for example, can be in configuration information include multiple target objects object information, root
According to sampling prescription, desensitization configuration and the metadata for each target object, a control instruction is generated, so that each is controlled
Instruction is respectively used to execute different tasks.
In accordance with an embodiment of the present disclosure, in operation S510 and operation S520, concurrent configuration parameter is can be set in user, with control
The task quantity that source cluster processed is performed simultaneously.Such as concurrent configuration parameter can be 3, then source cluster is performed simultaneously 3 tasks pair
The control instruction answered.
In operation S530, task based access control quantity, voltage input cluster device executes multiple control instructions.
In accordance with an embodiment of the present disclosure, this method can be enabled a user to by the way that concurrent configuration parameter is arranged to source cluster
The concurrent quantity of 210 execution control instructions is managed.
In accordance with an embodiment of the present disclosure, operation S530 may further include: obtain currently available in the cluster device of source
Resource;And it is based on current available resource and the quantity, it determines and distributes to the current available resource of each task, to use point
The current available resource matched runs the control instruction of the task.
Current available resource, such as may include currently available cpu resource, memory source etc..
In accordance with an embodiment of the present disclosure, such as the number of task that can be executed according to currently available cpu resource and concurrently
Amount determines the cpu resource for distributing to each task.Specifically, for example, can be in the cluster of source currently can with CPU have 100,
The quantity of concurrently executing for task is 100, then can distribute a cpu resource for each task.
In accordance with an embodiment of the present disclosure, this method can make during multiple tasks are performed simultaneously, to currently may be used
With resource reasonable distribution.
In accordance with an embodiment of the present disclosure, data processing method can also include: and generate controlling equipment to obtain source cluster device
In current available resource acquisition record, obtain in record to inquire with the presence or absence of abnormal acquisition record.
Such as it can be access originator, access time, the access object of the current available resource of record access source cluster.
Such as the IP address of usual access originator should be the IP address of controlling equipment, when inquiry obtains record discovery, there are it
In the case that his IP address is access originator, it is determined as abnormal acquisition record.Access object for example can be the source number of access
According to the table name of table.
In accordance with an embodiment of the present disclosure, data processing method can also include: that verification copies in the target cluster
Whether data volume is consistent with the original data volume of sandbox area storage;And the data volume and the original data volume not
Under unanimous circumstances, warning information is issued.
Such as can be data volume in controlling equipment access target cluster, to compare whether data volume stores with sandbox area
Original data volume it is consistent.
In accordance with an embodiment of the present disclosure, data processing method can also include that the process of this data preparation is generated shelves
Case log, so that the later period checks.It for example can recorde time of this data preparation, source data, target cluster etc. in log.
Fig. 6 diagrammatically illustrates the operation principle schematic diagram of the controlling equipment 230 according to the embodiment of the present disclosure.
As shown in fig. 6, the available input information of controlling equipment 230, such as the behaviour above with reference to Fig. 3 description can be executed
Make S310.Input information for example may include target object inventory, sampling prescription and desensitization configuration.Target object inventory is for example
It can be the table name for being extracted the source data table of data.It may include multiple table names in target object inventory, with from multiple data
Data are extracted in table.
In accordance with an embodiment of the present disclosure, controlling equipment 230 establishes tables of data according to the metadata of source data table, with source
Metadata in tables of data is identical, and is configured according to desensitization and determine desensitization function.It can be according to pumping to dispatch executive program
Control gauge is then, desensitize function and metadata generation control instruction.Such as the operation S320 described above with reference to Fig. 3 can be executed.
As shown in fig. 6, controlling equipment 230 can also obtain concurrent configuration parameter, and it is based on concurrent configuration parameter, determined
Source cluster device 210 executes in batches the quantity of task.Such as the operation S510~S530 described above with reference to Fig. 5 can be executed.
In accordance with an embodiment of the present disclosure, such as it can be scheduling executive program according to sampling prescription, desensitization function, first number
Control instruction is generated according to concurrently configuration, to execute in batches the quantity of task by control instruction voltage input cluster device 210.
As shown in fig. 6, controlling equipment 230 can also include the current available resource in acquisition source cluster device 210, and
Based on current available resource and task quantity, the current available resource for distributing to each task is determined, to use the current of distribution
Available resources run the control instruction of the task.In accordance with an embodiment of the present disclosure, what controlling equipment 230 can for example will acquire works as
Preceding available resources are sent to scheduling executive program, by scheduling executive program according to currently being each appoint with resource and task quantity
Business distribution current available resource, to run multiple task by control instruction voltage input cluster 210.
In accordance with an embodiment of the present disclosure, it after perfect scheduling executive program being generated in controlling equipment 230, can will dispatch
Executive program is sent to source cluster device 210.Source cluster device 210 stores non-sandbox area 212 according to the scheduling executive program
Target object source data be sampled, operation of desensitizing, and by after desensitization data store into sandbox area 211, thus
Data after the desensitization stored in sandbox area 211 are copied into target cluster device 220.
As shown in fig. 6, controlling equipment 230 can also verify the target data that copies in target cluster 220 whether with source
Initial data in data is consistent;And in the case where target data and initial data are inconsistent, issue warning information.
In accordance with an embodiment of the present disclosure, controlling equipment 230 for example can be access target cluster 220, be copied to obtaining
Data volume in target cluster 220, thus compare copy to the data volume in target cluster whether with the original number in source data
It is consistent according to amount.
As shown in fig. 6, controlling equipment 230 can also generate controlling equipment 230 obtain source cluster device 210 in it is current can
It is recorded with the acquisition of resource, to inquire described obtain in record with the presence or absence of abnormal acquisition record.Such as it can be record
Access originator, access time, the access object of the current available resource of access originator cluster.Such as the IP address of usual access originator should
It is the IP address of controlling equipment, in the case that inquiry acquisition record discovery is access originator in the presence of other IP address, is determined as
Abnormal acquisition record.
As shown in fig. 6, controlling equipment 230 can also generate the archives log of this data preparation, so that the later period checks.Day
It for example can recorde time of this data preparation, source data, target cluster etc. in will.
Another aspect of the present disclosure discloses a kind of data processing system.
The data processing system may include source cluster device, controlling equipment and target cluster device.
Controlling equipment is for obtaining configuration information, wherein configuration information includes the object information of target object to be sampled
And sampling prescription, target object is stored in source cluster device and controlling equipment is used for based on sampling prescription and object information, raw
Control instruction is sent at control instruction, and to source cluster device.Controlling equipment for example can be controlling equipment shown in fig. 6
230。
Source cluster device includes sandbox area and non-sandbox area, and sandbox Qu Yufei sandbox area is mutually indepedent, non-sandbox Qu Zhongcun
Contain target object.Controlling equipment for example can be controlling equipment 210 shown in fig. 6.
Source cluster device is used to be sampled the target object stored in non-sandbox area, in response to control instruction to obtain
Data from the sample survey is obtained, data from the sample survey is stored and copies to the target cluster device from sandbox area to sandbox area, and by data from the sample survey.
Target cluster device for example can be controlling equipment 220 shown in fig. 6.
In accordance with an embodiment of the present disclosure, controlling equipment can execute data processing method described in any one above.
Another aspect of the present disclosure discloses a kind of data processing equipment.
Fig. 7 diagrammatically illustrates the schematic diagram of the data processing equipment 700 according to the embodiment of the present disclosure.
As shown in fig. 7, data processing equipment 700 includes obtaining module 710, generation module 720 and sending module 730.
Module 710 is obtained, such as the operation S310 above with reference to Fig. 3 description can be executed, for obtaining configuration information,
In, the configuration information includes the object information and sampling prescription of target object to be sampled, and the target object is stored in source
Cluster device, the source cluster device include mutually independent sandbox area and non-sandbox area, and the target object is stored in described
Non- sandbox area.
Generation module 720, such as the operation S320 above with reference to Fig. 3 description can be executed, for being advised based on the sampling
Then with the object information, control instruction is generated.
Sending module 730, such as the operation S330 above with reference to Fig. 3 description can be executed, for being set to the source cluster
Preparation send the control instruction, so that the source cluster device is sampled the source data in the target object, will sample
The data from the sample survey storage of acquisition is copied to the sandbox area of the source cluster device, and by the data from the sample survey from the sandbox area
Target cluster device.
It is module according to an embodiment of the present disclosure, submodule, unit, any number of or in which any more in subelement
A at least partly function can be realized in a module.It is single according to the module of the embodiment of the present disclosure, submodule, unit, son
Any one or more in member can be split into multiple modules to realize.According to the module of the embodiment of the present disclosure, submodule,
Any one or more in unit, subelement can at least be implemented partly as hardware circuit, such as field programmable gate
Array (FPGA), programmable logic array (PLA), system on chip, the system on substrate, the system in encapsulation, dedicated integrated electricity
Road (ASIC), or can be by the hardware or firmware for any other rational method for integrate or encapsulate to circuit come real
Show, or with any one in three kinds of software, hardware and firmware implementations or with wherein any several appropriately combined next reality
It is existing.Alternatively, can be at least by part according to one or more of the module of the embodiment of the present disclosure, submodule, unit, subelement
Ground is embodied as computer program module, when the computer program module is run, can execute corresponding function.
For example, obtaining module 710, any number of in generation module 720 and sending module 730 may be incorporated in one
It is realized in module or any one module therein can be split into multiple modules.Alternatively, one in these modules or
At least partly function of multiple modules can be combined at least partly function of other modules, and be realized in a module.
In accordance with an embodiment of the present disclosure, obtaining at least one of module 710, generation module 720 and sending module 730 can be at least
It is implemented partly as hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), on piece system
System, the system on substrate, the system in encapsulation, specific integrated circuit (ASIC), or can be by being integrated or being sealed to circuit
The hardware such as any other rational method or firmware of dress realize, or in three kinds of software, hardware and firmware implementations
Any one several appropriately combined is realized with wherein any.Alternatively, obtaining module 710, generation module 720 and sending
At least one of module 730 can at least be implemented partly as computer program module, when the computer program module quilt
When operation, corresponding function can be executed.
Fig. 8 diagrammatically illustrates the block diagram of the electronic equipment according to the embodiment of the present disclosure.Electronic equipment shown in Fig. 8 is only
Only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 8, include processor 801 according to the electronic equipment 800 of the embodiment of the present disclosure, it can be according to being stored in
Program in read-only memory (ROM) 802 is loaded into the journey in random access storage device (RAM) 803 from storage section 808
Sequence and execute various movements appropriate and processing.Processor 801 for example may include general purpose microprocessor (such as CPU), instruction
Set processor and/or related chip group and/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor
801 can also include the onboard storage device for caching purposes.Processor 801 may include being implemented for executing according to the disclosure
Single treatment unit either multiple processing units of the different movements of the method flow of example.
In RAM 803, it is stored with system 800 and operates required various programs and data.Processor 801, ROM 802 with
And RAM 803 is connected with each other by bus 804.Processor 801 is held by executing the program in ROM 802 and/or RAM 803
The various operations gone according to the method flow of the embodiment of the present disclosure.It is noted that described program also can store except ROM 802
In one or more memories other than RAM 803.Processor 801 can also be stored in one or more of by execution
Program in memory executes the various operations of the method flow according to the embodiment of the present disclosure.
In accordance with an embodiment of the present disclosure, system 800 can also include input/output (I/O) interface 805, input/output
(I/O) interface 805 is also connected to bus 804.System 800 can also include be connected to I/O interface 805 with one in lower component
Item is multinomial: the importation 806 including keyboard, mouse etc.;Including such as cathode-ray tube (CRT), liquid crystal display (LCD)
Deng and loudspeaker etc. output par, c 807;Storage section 808 including hard disk etc.;And including such as LAN card, modulatedemodulate
Adjust the communications portion 809 of the network interface card of device etc..Communications portion 809 executes communication process via the network of such as internet.
Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as disk, CD, magneto-optic disk, semiconductor
Memory etc. is mounted on as needed on driver 810, in order to be pacified as needed from the computer program read thereon
It is packed into storage section 808.
In accordance with an embodiment of the present disclosure, computer software journey may be implemented as according to the method flow of the embodiment of the present disclosure
Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer readable storage medium
Computer program, which includes the program code for method shown in execution flow chart.In such implementation
In example, which can be downloaded and installed from network by communications portion 809, and/or from detachable media 811
It is mounted.When the computer program is executed by processor 801, the above-mentioned function limited in the system of the embodiment of the present disclosure is executed
Energy.In accordance with an embodiment of the present disclosure, system as described above, unit, module, unit etc. can pass through computer program
Module is realized.
The disclosure additionally provides a kind of computer readable storage medium, which can be above-mentioned reality
It applies included in equipment/device/system described in example;Be also possible to individualism, and without be incorporated the equipment/device/
In system.Above-mentioned computer readable storage medium carries one or more program, when said one or multiple program quilts
When execution, the method according to the embodiment of the present disclosure is realized.
In accordance with an embodiment of the present disclosure, computer readable storage medium can be non-volatile computer-readable storage medium
Matter, such as can include but is not limited to: portable computer diskette, hard disk, random access storage device (RAM), read-only memory
(ROM), erasable programmable read only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can
With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
Person is in connection.For example, in accordance with an embodiment of the present disclosure, computer readable storage medium may include above-described
One or more memories other than ROM 802 and/or RAM 803 and/or ROM 802 and RAM 803.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
It will be understood by those skilled in the art that the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination, even if such combination or combination are not expressly recited in the disclosure.Particularly, In
In the case where not departing from disclosure spirit or teaching, the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.Above to the disclosure
Embodiment be described.But the purpose that these embodiments are merely to illustrate that, and it is not intended to the limitation disclosure
Range.Although respectively describing each embodiment above, but it is not intended that the measure in each embodiment cannot be advantageous
Ground is used in combination.The scope of the present disclosure is defined by the appended claims and the equivalents thereof.The scope of the present disclosure, this field are not departed from
Technical staff can make a variety of alternatives and modifications, these alternatives and modifications should all be fallen within the scope of the disclosure.
Claims (10)
1. a kind of data processing method is applied to controlling equipment, which comprises
Obtain configuration information, wherein the configuration information includes the object information and sampling prescription of target object to be sampled, institute
It states target object and is stored in source cluster device, the source cluster device includes mutually independent sandbox area and non-sandbox area, described
Target object is stored in the non-sandbox area;
Based on the sampling prescription and the object information, control instruction is generated;And
The control instruction is sent to the source cluster device, so that the source cluster device is to the source number in the target object
According to being sampled, the data from the sample survey that sampling is obtained is stored to the sandbox area of the source cluster device, and by the data from the sample survey
Target cluster device is copied to from the sandbox area.
2. according to the method described in claim 1, wherein, the configuration information further includes desensitization configuration;It is described to be based on the pumping
Then with the object information, generate control instruction includes: control gauge
Based on the object information, the metadata of the target object is determined;
Based on the metadata, tables of data is established;
It is configured according to the desensitization and determines desensitization function, the desensitization function is used to carry out data desensitization to the data from the sample survey;
And
According to the sampling prescription, the tables of data and the desensitization function, control instruction is generated.
3. according to the method described in claim 2, wherein, generating the control instruction to execute following operation:
According to the sampling prescription and the object information, data from the sample survey is obtained from the target object;
Data desensitization is carried out to the data from the sample survey, obtains desensitization data;
The desensitization data are saved in the sandbox area;And
The desensitization data are copied into the target cluster device from the sandbox area.
4. according to the method described in claim 1, further include:
In the case where there is multiple control instructions for executing different task respectively, concurrent configuration parameter is obtained;
Based on the concurrent configuration parameter, the task quantity for the task that the source cluster device is performed simultaneously is determined;And
Based on the task quantity, controls the source cluster device and execute multiple control instructions.
5. it is described to be based on the task quantity according to the method described in claim 4, wherein, it controls the source cluster device and holds
The multiple control instructions of row include:
Obtain the current available resource in the source cluster device;
Based on the current available resource and the task quantity, the current available resource for distributing to each task is determined,
To use the current available resource of distribution to run the control instruction of the task.
6. according to the method described in claim 5, further include:
The acquisition record for the current available resource that the controlling equipment obtains in the source cluster device is generated, it is described to inquire
It obtains in record with the presence or absence of abnormal acquisition record.
7. according to the method described in claim 1, further include:
Whether verification copies to the data volume in the target cluster consistent with the original data volume of sandbox area storage;And
In the case where the data volume and the original data volume are inconsistent, warning information is issued.
8. a kind of data processing system, comprising:
Source cluster device, the source cluster device include sandbox area and non-sandbox area, the sandbox area and the non-sandbox area phase
It is mutually independent, target object is stored in the non-sandbox area;
Target cluster device;And
Controlling equipment, the controlling equipment are used to execute the method as described in claim 1~7 any one,
Wherein, the source cluster device is used in response to the control instruction, to the target object stored in the non-sandbox area
It is sampled, to obtain data from the sample survey, the data from the sample survey is stored to the sandbox area, and by the data from the sample survey from sandbox
Area copies to the target cluster device.
9. a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs,
Wherein, when one or more of programs are executed by one or more of processors, so that one or more of
Processor executes the method as described in claim 1~7 any one.
10. a kind of computer readable storage medium, is stored thereon with executable instruction, which makes to handle when being executed by processor
Device executes the method as described in claim 1~7 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910688165.5A CN110399209B (en) | 2019-07-26 | 2019-07-26 | Data processing method, system, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910688165.5A CN110399209B (en) | 2019-07-26 | 2019-07-26 | Data processing method, system, electronic device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399209A true CN110399209A (en) | 2019-11-01 |
CN110399209B CN110399209B (en) | 2022-02-25 |
Family
ID=68326347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910688165.5A Active CN110399209B (en) | 2019-07-26 | 2019-07-26 | Data processing method, system, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399209B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800473A (en) * | 2021-03-17 | 2021-05-14 | 好人生(上海)健康科技有限公司 | Data processing method based on big data safety house |
CN112988604A (en) * | 2021-04-30 | 2021-06-18 | 中国工商银行股份有限公司 | Object testing method, testing system, electronic device and readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868389A (en) * | 2016-04-15 | 2016-08-17 | 北京思特奇信息技术股份有限公司 | Method and system for implementing data sandbox based on mongoDB |
CN106650424A (en) * | 2016-11-28 | 2017-05-10 | 北京奇虎科技有限公司 | Method and device for detecting target sample file |
CN106776143A (en) * | 2016-12-27 | 2017-05-31 | 北京奇虎科技有限公司 | The method and terminal device of a kind of mirror back-up for end application |
CN107247741A (en) * | 2017-05-14 | 2017-10-13 | 四川盛世天成信息技术有限公司 | A kind of concentrating type textual magnanimity sensitive data processing method and system |
CN109491989A (en) * | 2018-11-12 | 2019-03-19 | 北京懿医云科技有限公司 | Data processing method and device, electronic equipment, storage medium |
CN109635024A (en) * | 2018-11-23 | 2019-04-16 | 华迪计算机集团有限公司 | A kind of data migration method and system |
-
2019
- 2019-07-26 CN CN201910688165.5A patent/CN110399209B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868389A (en) * | 2016-04-15 | 2016-08-17 | 北京思特奇信息技术股份有限公司 | Method and system for implementing data sandbox based on mongoDB |
CN106650424A (en) * | 2016-11-28 | 2017-05-10 | 北京奇虎科技有限公司 | Method and device for detecting target sample file |
CN106776143A (en) * | 2016-12-27 | 2017-05-31 | 北京奇虎科技有限公司 | The method and terminal device of a kind of mirror back-up for end application |
CN107247741A (en) * | 2017-05-14 | 2017-10-13 | 四川盛世天成信息技术有限公司 | A kind of concentrating type textual magnanimity sensitive data processing method and system |
CN109491989A (en) * | 2018-11-12 | 2019-03-19 | 北京懿医云科技有限公司 | Data processing method and device, electronic equipment, storage medium |
CN109635024A (en) * | 2018-11-23 | 2019-04-16 | 华迪计算机集团有限公司 | A kind of data migration method and system |
Non-Patent Citations (2)
Title |
---|
KONRAD JAMROZIK: "Mining Sandboxes", 《2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING》 * |
谢欣: "基于SequoiaDB的金融业历史数据存储与查询解决方案", 《金融电子化》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800473A (en) * | 2021-03-17 | 2021-05-14 | 好人生(上海)健康科技有限公司 | Data processing method based on big data safety house |
CN112988604A (en) * | 2021-04-30 | 2021-06-18 | 中国工商银行股份有限公司 | Object testing method, testing system, electronic device and readable storage medium |
CN112988604B (en) * | 2021-04-30 | 2024-04-02 | 中国工商银行股份有限公司 | Object testing method, testing system, electronic device and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110399209B (en) | 2022-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106027330B (en) | A kind of front end system message test method and simulation baffle system | |
US8990157B2 (en) | Replication support for structured data | |
US10291704B2 (en) | Networked solutions integration using a cloud business object broker | |
CN109254992A (en) | Project generation method and system, computer system and computer readable storage medium storing program for executing | |
JP2019533854A (en) | Graph generation for distributed event processing systems. | |
US20150213066A1 (en) | System and method for creating data models from complex raw log files | |
JP5923691B2 (en) | Logistics cloud system and program | |
CN107506190A (en) | XML file amending method and device based on Spring frameworks | |
Silva et al. | Integrating big data into the computing curricula | |
CN110399209A (en) | Data processing method, system, electronic equipment and storage medium | |
CN111310232A (en) | Data desensitization method and device, electronic equipment and storage medium | |
EP2904520B1 (en) | Reference data segmentation from single to multiple tables | |
US11782888B2 (en) | Dynamic multi-platform model generation and deployment system | |
EP2835774A1 (en) | A method and device for executing an enterprise process | |
US10110610B2 (en) | Dynamic permission assessment and reporting engines | |
CN113962597A (en) | Data analysis method and device, electronic equipment and storage medium | |
CN108268512A (en) | A kind of tag queries method and device | |
WO2019111188A1 (en) | Job management in data processing system | |
US11700241B2 (en) | Isolated data processing modules | |
US20120317073A1 (en) | Replication Support for Procedures with Arguments of Unsupported Types | |
CN115328997B (en) | Data synchronization method, system, device and storage medium | |
US20180165337A1 (en) | System for Extracting Data from a Database in a User Selected Format and Related Methods and Computer Program Products | |
US20140280365A1 (en) | Method and system for data system management using cloud-based data migration | |
CN114978944A (en) | Pressure testing method, device and computer program product | |
CN111625465A (en) | Program generation method, device and system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |