CN108038225B - A kind of data processing method and system - Google Patents
A kind of data processing method and system Download PDFInfo
- Publication number
- CN108038225B CN108038225B CN201711418696.XA CN201711418696A CN108038225B CN 108038225 B CN108038225 B CN 108038225B CN 201711418696 A CN201711418696 A CN 201711418696A CN 108038225 B CN108038225 B CN 108038225B
- Authority
- CN
- China
- Prior art keywords
- data
- data set
- keyword
- acquisition system
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/235—Update request formulation
Abstract
The present invention provides a kind of data processing method and system, receives the first data acquisition system of external system transmission;The second data set associated with target data set to be updated is generated in a data processing system;Empty the data in the target data set;Data update is carried out to the target data set using the data in first data acquisition system and the second data set.In this way, in the first data acquisition system for receiving external system transmission, when needing to carry out data update, it is ensured that the stability of data processing system saves the plenty of time, and improve the efficiency of data update without being scanned to all data.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of data processing methods and data processing system.
Background technique
In recent years, big data processing has become global problem with analysis, as economic society is information-based and automation
Level is continuously improved, and in many field face big data problems such as public administration, public service, scientific research, business application, needs
There are various specific aims and cost-effective solution.Big data platform provides processing capacity for industry big data, collects data
The functions such as access, data processing, data storage, query and search, analysis mining, application interface are integrated.
In data processing field, current environment increasingly payes attention to the accumulation of data, increasing with data volume, right
It handles the ability of data and has higher requirement to the basic framework of system, need faster processing speed, bigger data
Storage capacity and ease for maintenance.
Under some business scenarios, the data variation historical information of recording key section is needed, to meet the needs of users,
It needs periodically to be updated the data in database.In some big data platforms, file system is based on distribution
The storage of formula file, i.e., file has been stored in different nodes, and traditional data to such data platform carry out history more
New processing mode needs to have data progressive scan to described, i.e., in storage region since the first row of first file
Scanning, the data until finding needs are modified, but in face of growing data volume and increasingly complicated business, especially
It is the big data era of the huge increasing of data volume, carries out the scanning of all data in this way, low efficiency, time-consuming, and especially data volume is got over
Greatly, the query time and feedback time needed is longer, be unable to satisfy current data volume it is increasing in the case of timeliness demand,
The reasons such as leading to existing data processing system due to computationally intensive, and take a long time, data processing system stability is poor,
Easily there is system Caton or even stuck situation.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method and data processing system, to solve existing data processing system
It unites the low efficiency of data processing and due to time-consuming etc., leads to the problem of data processing system stability difference.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of data processing methods, which comprises
Store the internal data of the data processing system, and the data obtained from outside;
Manage service logic;
Data service is provided to the external system of data processing system;
Data are handled.
Further, the method also includes:
The operational order for receiving user's input, is managed and is arranged to the data processing system.
Further, the internal data of the storage data processing system, and the step of the data from outside acquisition
Suddenly, comprising:
Storing from the data that outside obtains includes direct extraction-type data and document form data.
Further, the step of control service logic, comprising:
The service logic of the data processing system is stored, the service logic includes at least one following: scheduling rule,
Data genetic connection, model metadata and wscript.exe.
Further, described the step of providing data service to the external system of data processing system, comprising:
The queue of external system pushed information and data to data processing system;
Storage file form data;
It is connect with the down-stream system of data processing system or service system, is the downstream system by the interface unit
System or service system provide data.
Further, the method also includes:
Receive the parameter of input;
Based on preset rules and the parameter, automation tools script is generated.
Further, described the step of data are handled, comprising:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
The target data set is carried out using the data in first data acquisition system and the second data set
Data update.
Further, it is generated and target data set associated second to be updated in a data processing system described
Before the step of data acquisition system, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and institute if inquiring first keyword in the target data set
It states the first keyword or data that critical field matches, executes described generate in a data processing system and mesh to be updated
The step of marking data acquisition system associated the second data set.
Further, it is carried out in the target data set described using first keyword or critical field
After the step of inquiry, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire
The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field
It marks in data acquisition system.
Further, the data using in first data acquisition system and the second data set are to the target
Data acquisition system carries out the step of data update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described
The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set
The data to match in conjunction with second keyword or critical field are updated in the target data set;
The data to match in first data acquisition system with first keyword or critical field are updated to institute
It states in target data set.
Further, described to use first data acquisition system when the target data set is combined into zipper data acquisition system
The step of data update is carried out to the target data set with the data in the second data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described
The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set
The data to match in conjunction with second keyword or critical field are updated in the target data set;
Determine the first zipper number to match in the second data set with first keyword or critical field
According to;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate institute
The time of the second data set is stated, and is based in first data acquisition system and first keyword or critical field phase
The data matched generate the second sub- zipper data of the first zipper data, wherein when the open chain of the second sub- zipper data
Between for time for generating the second data set, the closed chain time is empty or maximum;
If not inquiring the number to match with first keyword or critical field in the second data set
According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field
Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky
Or maximum;
Modified first zipper data and the second zipper data are updated in the target data set.
Further, after the data emptied in the target data set the step of, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation
Restore the data in the target data set according to the data in set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, obtains backup in advance
Backup Data set, restore the data in the target data set using the data in the Backup Data set.
Further, described to generate second number associated with target data set to be updated in a data processing system
The step of according to set, comprising:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated
All data stored, or obtain receive first data acquisition system after, in this target data set to be updated
Data, back up all data stored in the target data set that has been updated or this target data set to be updated
Data in conjunction are to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target data set
Data in conjunction are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this institute
State the second data set.
The embodiment of the present invention also provides a kind of data processing method, which comprises
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
The target data set is carried out using the data in first data acquisition system and the second data set
Data update.
Further, it is generated and target data set associated second to be updated in a data processing system described
Before the step of data acquisition system, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and institute if inquiring first keyword in the target data set
It states the first keyword or data that critical field matches, executes described generate in a data processing system and mesh to be updated
The step of marking data acquisition system associated the second data set.
Further, it is carried out in the target data set described using first keyword or critical field
After the step of inquiry, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire
The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field
It marks in data acquisition system.
Further, the data using in first data acquisition system and the second data set are to the target
Data acquisition system carries out the step of data update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described
The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set
The data to match in conjunction with second keyword or critical field are updated in the target data set;
The data to match in first data acquisition system with first keyword or critical field are updated to institute
It states in target data set.
Further, described to use first data acquisition system when the target data set is combined into zipper data acquisition system
The step of data update is carried out to the target data set with the data in the second data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described
The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set
The data to match in conjunction with second keyword or critical field are updated in the target data set;
Determine the first zipper number to match in the second data set with first keyword or critical field
According to;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate institute
The time of the second data set is stated, and is based in first data acquisition system and first keyword or critical field phase
The data matched generate the second sub- zipper data of the first zipper data, wherein when the open chain of the second sub- zipper data
Between for time for generating the second data set, the closed chain time is empty or maximum;
If not inquiring the number to match with first keyword or critical field in the second data set
According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field
Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky
Or maximum;
Modified first zipper data and the second zipper data are updated in the target data set.
Further, after the data emptied in the target data set the step of, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation
Restore the data in the target data set according to the data in set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, obtains backup in advance
Backup Data set, restore the data in the target data set using the data in the Backup Data set.
Further, described to generate second number associated with target data set to be updated in a data processing system
The step of according to set, comprising:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated
All data stored, or obtain receive first data acquisition system after, in this target data set to be updated
Data, back up all data stored in the target data set that has been updated or this target data set to be updated
Data in conjunction are to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target data set
Data in conjunction are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this institute
State the second data set.
The embodiment of the present invention also provides a kind of data processing system, and the data processing system includes:
Data memory module, for storing the internal data of the data processing system, and the data obtained from outside;
Business logic modules, for managing service logic;
Data service module, for providing data service to the external system of data processing system;
Data processing engine module, for handling data.
Further, the data processing system includes:
Information exchange module, for receive user input operational order, the data processing system is managed and
Setting.
Further, the data memory module is distributed file storage system, data memory module storage from
The data that outside obtains include direct extraction-type data and document form data.
Further, the business logic modules include:
Storage unit, for storing the service logic of the data processing system, the service logic include it is following at least
One of: scheduling rule, data genetic connection, model metadata and wscript.exe.
Further, the data service module includes:
Push unit, for the queue of external system pushed information and data to data processing system;
Unit is achieved, storage file form data are used for;
Data transmission interface unit passes through institute for connecting with the down-stream system of data processing system or service system
It states interface unit and provides data for the down-stream system or service system.
Further, the data processing system further includes automation tools module, and the automation tools module includes:
Parameter receiving unit, parameter for receiving input;
Script generation unit generates automation tools script for being based on preset rules and the parameter.
Further, the data processing engine module includes:
Receiving unit, for receiving the first data acquisition system of external system transmission;
Generation unit, for generating second number associated with target data set to be updated in a data processing system
According to set;
Clearing cell, for emptying the data in the target data set;
First updating unit, for using the data in first data acquisition system and the second data set to described
Target data set carries out data update.
Further, the data processing engine module further include:
First determination unit, for determining the first keyword or critical field from first data acquisition system;
Query unit, for being looked into the target data set using first keyword or critical field
It askes;
Execution unit, if for inquiring first keyword or critical field in the target data set,
The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system
The step of at the second data set associated with target data set to be updated.
Further, the data processing engine module further include:
Second updating unit, if for not inquiring first keyword or key in the target data set
Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system
Data be updated in the target data set.
Further, first updating unit includes:
First determines subelement, for determining the second keyword or critical field from the second data set;
First inquiry subelement, for using second keyword or critical field in first data acquisition system
It is inquired;
First updates subelement, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Second update subelement, for by first data acquisition system with first keyword or critical field phase
Matched data are updated in the target data set.
Further, when the target data set is combined into zipper data acquisition system, first updating unit includes:
Second determines subelement, for determining the second keyword or critical field from the second data set;
Second inquiry subelement, for using second keyword or critical field in first data acquisition system
It is inquired;
Third updates subelement, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Third determines subelement, determine in the second data set with first keyword or critical field phase
The the first zipper data matched;
Subelement is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state
The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or
The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn
The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Subelement is generated, if for not inquiring in the second data set and first keyword or key
The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field
According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when
Between, the closed chain time is empty or maximum;
4th updates subelement, described for being updated to modified first zipper data and the second zipper data
In target data set.
Further, the data processing engine module includes:
First recovery unit occurs updating when if carrying out data update to the target data set for detecting wrong
Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery unit
Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set
Data in set.
Further, the generation unit is also used to obtain the preset time period received before first data acquisition system
All data that are interior, being stored in the target data set being updated, or obtain receive first data acquisition system after,
Data in this target data set to be updated back up all data stored in the target data set being updated
Or the data in this target data set to be updated are to generate the second data set;
Alternatively, the generation unit is also used to obtain last the second data set generated when receiving the first data acquisition system
Close, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when the second number for generating
According in set, to generate this second data set.
The embodiment of the invention also provides a kind of data processing system, the data processing system includes:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second number associated with target data set to be updated in a data processing system
According to set;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to described
Target data set carries out data update.
Further, the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being looked into the target data set using first keyword or critical field
It askes;
Execution module, if for inquiring first keyword or critical field in the target data set,
The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system
The step of at the second data set associated with target data set to be updated.
Further, the data processing system further include:
Second update module, if for not inquiring first keyword or key in the target data set
Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system
Data be updated in the target data set.
Further, first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for using second keyword or critical field in first data acquisition system
It is inquired;
First updates submodule, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Second update submodule, for by first data acquisition system with first keyword or critical field phase
Matched data are updated in the target data set.
Further, when the target data set is combined into zipper data acquisition system, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for using second keyword or critical field in first data acquisition system
It is inquired;
Third updates submodule, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Third determines submodule, for determine in the second data set with first keyword or critical field
The the first zipper data to match;
Submodule is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state
The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or
The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn
The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or key
The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field
According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when
Between, the closed chain time is empty or maximum;
4th updates submodule, described for being updated to modified first zipper data and the second zipper data
In target data set.
Further, the data processing system further include:
First recovery module occurs updating when if carrying out data update to the target data set for detecting wrong
Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery module
Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set
Data in set.
Further, which is characterized in that
The generation module is specifically also used to obtain in the preset time period before receiving first data acquisition system,
All data stored in the target data set being updated, or obtain receive first data acquisition system after, this
Data in secondary target data set to be updated, back up all data stored in the target data set that has been updated or
Data in this target data set to be updated of person are to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last second generated when receiving the first data acquisition system
Data acquisition system, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate
In the second data set, to generate this second data set.
Data processing method and data processing system provided in an embodiment of the present invention receive the first number of external system transmission
According to set;The second data set associated with target data set to be updated is generated in a data processing system;Empty institute
State the data in target data set;Using the data in first data acquisition system and the second data set to the mesh
It marks data acquisition system and carries out data update.In this way, needing to carry out data more in the first data acquisition system for receiving external system transmission
When new, it is ensured that the stability of data processing system saves the plenty of time, and improve without being scanned to all data
The efficiency that data update.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention
Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention,
For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow chart for the data processing method that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides data processing method flow chart;
Fig. 3 is the service information list that the data before indicating not update in target data set indicate;
Fig. 4 is the information table for indicating the data in the first data acquisition system and indicating;
Fig. 5 is the information table for indicating the data in the second data set and indicating;
Fig. 6 and Fig. 7 is the process schematic for indicating the information indicated the data in target data set and being updated;
Fig. 8 is the service information list for indicating the data in updated target data set and indicating;
Fig. 9 is the service information list that the zipper data before indicating not update in target data set indicate;
Figure 10 is the information table for indicating the zipper data in the second data set and indicating;
Figure 11 is the information table for indicating the data in the first data acquisition system and indicating;
Figure 12 and Figure 13 is the process schematic for indicating the information indicated the data in target data set and being updated;
Figure 14 is the service information list for indicating the data in updated target data set and indicating;
Figure 15 is the structure chart for the data processing system that one embodiment of the invention provides;
Figure 16 is one of the structure chart of the data processing engine module of data processing system shown in Figure 15;
Figure 17 is the two of the structure chart of the data processing engine module of data processing system shown in Figure 15;
Figure 18 is the three of the structure chart of the data processing engine module of data processing system shown in Figure 15;
The four of the structure chart of the data processing engine module of data processing system shown in Figure 19 Figure 15;
Figure 20 is one of the structure chart of the first updating unit shown in Figure 16;
Figure 21 is the two of the structure chart of the first updating unit shown in Figure 16.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Data processing method provided in an embodiment of the present invention is applied to data processing system, and the data processing system can
To be a kind of data engineering platform (Data Engineering Platform, DEP), which comprises store the data
The internal data of processing system, and the data obtained from outside;Manage service logic;To the external system of data processing system
Data service is provided;Data are handled.
Wherein, the internal data of the data processing system, and the data obtained from outside are stored, number is mainly passed through
It is handled according to the data memory module of processing system.And data memory module can be distributed document storage (Hadoop
Distributed File System, HDFS) system.HDFS system is accumulation layer, for storing the internal data of DEP, and
The data that storage DEP is obtained from external system.DEP obtains data from external system, can be and directly extracts data, such as relationship
Data in type Database Systems DB2, the data in database Cloud Server Oracle ExaData, the data of Excel format,
It can also be document form data, i.e., be sent to the data of DEP, such as the data of textual form with document form, further include non-
Structural formula data, such as log log, audio/video multimedia file.
Wherein, service logic is managed, is mainly handled by the business logic modules of data processing system.And industry
Business logic module can also store the service logic of the data processing system, and the service logic includes at least one following:
Scheduling rule, data genetic connection, model metadata and wscript.exe (such as automation tools) etc..
Wherein, data service is provided to the external system of data processing system, mainly passes through the number of data processing system
It is handled according to service module.Wherein, data service module can also to the queue of external system pushed information and data, such as
PUSH message queue, propelling data to database;Storage file form data;And with the down-stream system of data processing system or
The connection of person's service system, provides data, such as reporting system, Analysis Service etc. for the down-stream system or service system.
Wherein, the method also includes: receive user input operational order, the data processing system is managed
And setting.User may include business personnel (personnel on service line), operation maintenance personnel (personnel on technology line) etc., user's interaction
Corresponding UI user interface can be set in module.
Wherein, it can also be realized by the automation tools module of data processing system: receive the parameter of input;Based on pre-
If rule and the parameter, generate automation tools script.The parameter of input can be the ginseng of user input data processing system
Number can also be according to the instruction write-in parameter corresponding with described instruction received.The parameter includes at least one following:
Title, field, the data type of data acquisition system.
For example, if wondering certain customer banking account remaining sum situation of change, i.e., it should be understood that the remaining sum (mesh of client
The balance information table indicated in mark data acquisition system) and revenue and expenditure detail (the balance detail information table indicated in target data set),
The carry out data update for generating and realizing in above-described embodiment can be automated by the corresponding automation tools of automation module
Method (connection table inquiry compares and insertion algorithm) correlative code, real dynamic inquiry.It needs to run based on business, it is flat in Hadoop
Platform record data variation history operation, specifically can by Hadoop platform data variation historical record into HDFS.
It should be noted that automation tools module rule-based (such as can be carried out by the data processing method
The method that data update) write automation tools (i.e. one section of program), it is only necessary to understand which data acquisition system needs in DEP pass through
The history of zipper method record variation, is generated by the automation that the algorithm routine can be realized in the automation tools, such as
SQL statement is generated in Hive.
Wherein, data are handled, is mainly located by the data processing engine module of data processing system
Reason.Data processing engine module can be structured query language (Structured Query Language) engine modules, letter
Claim SQL engine modules, SQL engine modules can be made of engines such as Hive and/or Spark.
It is the flow chart for the data processing method that one embodiment of the invention provides referring to Fig. 1, Fig. 1.The method can answer
For data processing system, as shown in Figure 1, the described method comprises the following steps:
Step 101, the first data acquisition system for receiving external system transmission.
Under some business scenarios, the data variation historical information of recording key section, to meet customer need, example are needed
Such as in financial field, certain customer banking account remaining sum change histories information need to be recorded, to meet customer inquiries bank account balances
Demand.It is therefore desirable to carry out periodic data update to the data in database.
Therefore, in this step, the first data set that the reception of data processing system meeting periodicity is transmitted from external system
It closes.
Wherein, the reception of data processing system periodicity first data acquisition system, can be with time limit fixed cycle
First data acquisition system is received, receive within such as 1 day primary or is received within 12 hours primary;For the timeliness of data, at data
Reason system is also possible to reception or approximate real time reception first data acquisition system in real time, receives once within such as 1 hour, or
Person's half an hour receives once or even a few minutes reception one is inferior, does not do any restriction.It can be in first data acquisition system and include
There are modification data in batch.
Wherein, first data acquisition system, can be the set of single data, such as the set of single type of service data,
Such as only it is also possible to the number of single client or target comprising the deposit data financial transaction data that perhaps flowing water pays data
According to, such as only include Zhang San, or only include Li Si associated traffic data, be also possible to the set of integrated data, such as comprising not
With the data of type of service, such as data financial transaction data, and communication can be paid comprising deposit data and flowing water simultaneously
Data etc. can also include the data of multiple clients or target simultaneously, such as simultaneously include the related service number of Zhang San and Li Si
According to etc..
Wherein, the first data acquisition system for receiving external system transmission can be and directly receive the first data from external system
Set, is also possible to the related memory module by data processing system after storing the first data acquisition system of external system,
The first data acquisition system of external system is obtained from memory module.
Step 102 generates second data set associated with target data set to be updated in a data processing system
It closes.
In the step, after the data processing system receives first data acquisition system, the data processing system
System can control in the data processing system, generate one and the target data set associated second to be updated
Data acquisition system.
Wherein, the type of the second data set associated with the target data set, can be second data
The data type for including in set perhaps type and data type or data in the target data set represented by data
Represented type is identical.
For example, for example, the data in the target data set be certain client Zhang San cash in banks data or industry
Be engaged in pipelined data etc., then the data in the second data set generated be also client Zhang San cash in banks data or
Business pipelined data, and if the data in the target data set include the cash in banks data or industry of certain client Zhang San
The cash in banks data or business pipelined data of business pipelined data and Li Si, then the second data set generated
In data be also client Zhang San cash in banks data perhaps the cash in banks data of business pipelined data and Li Si or
Business pipelined data.
Wherein, the data for including in the second data set, can be most complete data, i.e., described second data set
Conjunction is the maximum data acquisition system of time span, such as the data for including in the second data set, be can be from described in generation
Target data set runs the beginning jointly, ends all data recorded in target data set described in current time, that is to say, that described
The second data set records the time span of data, is since generating the business datum until the current time, that is, most
The long time.
Wherein, data that are corresponding, including in the target data set, can be most complete data, i.e., described
Target data set is the maximum data acquisition system of time span;The data for including in the target data set are also possible to only
Comprising the partial data in most complete data, the data of this reproducting periods are such as only arrived comprising last update, i.e., only include
Data in one update cycle or the data in several update cycles.
Preferably, the data in the target data set and the second data set, all be comprising maximum time across
Data information in degree.
Wherein, the second data set is the time-domain snapshot data set of the target data set.
For generating the second data set, can be by way of backup, usage history data i.e. this update
The mode that preceding target data set is backed up generates the second data set.
Further, the second data set, which can be, is generated or is updated based on set frequency, such as described
The generation of the second data set or renewal frequency can be set to once a day, it is preferred that the generation of the second data set
Or renewal frequency, can be that be transmitted to the frequency of data processing system with modification data in batch identical, i.e., with data processing system
The frequency of reception first data acquisition system for periodicity of uniting is identical.
In this way, control generation and target data set can be passed through after data processing system receives the first data acquisition system
Associated the second data set is closed, determines target data set without being scanned to data all in data processing system
The position of middle data and back end, it is time saving and energy saving, the workload of data processing system can be effectively reduced, improve work effect
Rate.
Step 103 empties data in the target data set.
In the step, when data processing system control generates second data in the data processing system
After set, the data processing system, which can control, empties the data in the target data set, so as to subsequent to institute
State progress data update in target data set.
Step 104, using the data in first data acquisition system and the second data set to the target data
Set carries out data update.
In the step, after the data processing system empties the data in the target data set, the data
Processing system can extract the dependency number in the data and the second data set for needing to update in first data acquisition system
According to be inserted into, add or be written in the target data set, so that the target data set is carried out data update.
It preferably, is by the way of inquiry insertion by first data acquisition system and described second in present embodiment
Data in data acquisition system are updated in the target data set, are carried out more to the data in the target data set
Newly.
For example, for example, the data in the target data set be certain client Zhang San cash in banks data or industry
Business pipelined data etc., then using the data in first data acquisition system and the second data set to the target data
Set carries out data update, so that it may be the new cash in banks data or business using Zhang San in first data acquisition system
The passing cash in banks data of Zhang San or business pipelined data in pipelined data and the second data set, to store
To in the target data set, data update, or the such as described target data are carried out to the target data set
Data in set are the cash in banks data or business pipelined data and the cash in banks of client Li Si of certain client Zhang San
Data or business pipelined data, and such as this is to need to be updated the data of Zhang San, i.e., described first data acquisition system
In have the new cash in banks data or business pipelined data of Zhang San, then can be using in first data acquisition system
The passing bank of Zhang San deposits in the new cash in banks data or business pipelined data and the second data set of Zhang San
Amount of money is according to the perhaps passing cash in banks data or business pipelined data of business pipelined data and client Li Si, to deposit
Storage carries out data update into the target data set, to the target data set.
Wherein, data update is carried out to the target data set, is also possible to periodically update, update one within such as one day
It is secondary or update within 12 hours one inferior, it is preferred that the update cycle of the target data set, can be at the data
The frequency that reason system receives first data acquisition system is identical.
In this way, after data processing system receives the first data acquisition system, it can be by generating and number of targets to be updated
According to gathering associated the second data set, and after emptying target data set, by the first data acquisition system and the second data set
In data inquiry insertion by way of be inserted into target data set, to be updated to target data set, without pair
Data processing system carries out the scanning of total data, and the update of data in target data set can be completed, and can save totally
The time of scanning, and then the workload of data processing system is effectively reduced, improve working efficiency.
In the embodiment of the present invention, above-mentioned data processing system can be the backstage for developing and running processing data and put down
Platform etc. is realized and carries out distributed computing to mass data in the cluster that a large amount of computers form, it is preferred that the data processing
System is big data platform.
Above-mentioned data processing system can be applied to the big data application of financial system, medical system and educational system etc.
Scene, such as bank data system, hospital data system and school's data system.
Data processing method provided in an embodiment of the present invention receives the first data acquisition system of external system transmission;In data
The second data set associated with target data set to be updated is generated in processing system;Empty the target data set
In data;The target data set is carried out using the data in first data acquisition system and the second data set
Data update.In this way, when needing to carry out data update, can pass through in the first data acquisition system for receiving external system transmission
Data relevant to the data in the target data set are extracted in a data processing system, to generate and mesh to be updated
The associated the second data set of data acquisition system is marked, then by the first data acquisition system and the second data by way of inquiry insertion
In data insertion target data set in set, to be updated to the data in target data set, without to all numbers
It is scanned according to node, the update of data in target data set can be completed, the plenty of time of scan full hard disk can be saved,
And then the workload of data processing system is effectively reduced, improve the efficiency that data update.
Referring to fig. 2, Fig. 2 be another embodiment of the present invention provides data processing method flow chart.The method application
In data processing system, as shown in Fig. 2, the described method comprises the following steps:
Step 201, the first data acquisition system for receiving external system transmission.
Step 202 determines the first keyword or critical field from first data acquisition system.
In the step, after the data processing system receives the first data acquisition system of external system transmission, the number
It can be according to the data for needing to store or update in first data acquisition system, from first data set according to processing system
Corresponding first keyword or critical field are determined in conjunction.
Wherein, first keyword or critical field, only refer to, for example include more in first data acquisition system
The business datum of a type perhaps the data of multiple clients when can be the business datum to each type respectively or each visitor
The data at family are updated, when the business datum to corresponding type or the data of client update every time, corresponding type
Business datum or the data of client all have corresponding first keyword or critical field.
Wherein, the first keyword or critical field can be set according to actual needs, such as use a keyword
It can indicate data to be updated, can only determine keyword, conversely, needing the critical field of multiple keyword compositions
It could indicate to more capable data, i.e., it needs to be determined that critical field.
Step 203 is inquired in the target data set using first keyword or critical field.
In the step, after the data processing system determines first keyword or critical field, the data
Processing system can control is inquired using first keyword or critical field, i.e., using first keyword or
Person's critical field is inquired in the target data set, so that the data processing system can be learnt by inquiry,
Whether have existed in the target data set and matches with the data of first keyword or critical field expression
Data historical information or data record etc..
If step 204 inquires first keyword or critical field, Huo Zhecha in the target data set
Ask the data to match with first keyword or critical field, execute it is described generate in a data processing system with to
The step of target data set of update associated the second data set.
In the step, when the data processing system uses first keyword or critical field in the number of targets
According to being inquired in set, and first keyword or critical field are inquired in the target data set, or
If person inquires the data for existing in the target data set and matching with first keyword or critical field, that
The data processing system can think exist and first keyword or keyword in the target data set
The passing information for the data that section matches, then the data processing system can control described in execution in data processing system
The step of middle generation the second data set associated with target data set to be updated, to be completed by subsequent action
Data in the target data set are updated.
Wherein, the data to match with first keyword or critical field are inquired, can be referred in the mesh
When being inquired in mark data acquisition system, since certain data are the problems such as putting in order, it may be displayed in compared with rearward position, this
Sample may expend the time if directly inquiring first keyword or critical field longer, at this moment, when inquiring and institute
If stating the first keyword or data that critical field matches, so that it may be considered to have inquired first keyword or
Person's critical field reduces data scanning amount in this way, the time can be saved.
Wherein, the data to match with first keyword or critical field can be and first keyword
Perhaps such as described first keyword of the associated data of data or critical field that critical field indicates be, Zhang Sanhuo
The ID of person Zhang San, then the data that first keyword or critical field match can be expression, Zhang San or open
The data of three ID are also possible to certain deposits or Flow Record of the ID on some date of expression, Zhang San or Zhang San
Data, can either indicate, data of information such as the telephone number of the ID of Zhang San or Zhang San or identification card number etc..
Step 205 generates second data set associated with target data set to be updated in a data processing system
It closes.
Step 206 empties data in the target data set.
Step 207, using the data in first data acquisition system and the second data set to the target data
Set carries out data update.
Wherein, the description of step 201 and step 205 to step 207 is referred to the step 101 in above-described embodiment to step
Rapid 104 description, this will not be repeated here.
Optionally, after step 203, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire
The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field
It marks in data acquisition system.
In the step, when the data processing system uses first keyword or critical field in the number of targets
According to being inquired in set, and do not inquire first keyword or critical field in the target data set, and
And if not inquiring the data to match with first keyword or critical field in the target data set, that
The data processing system can think, the first keyword or critical field described in the first object data acquisition system
The data of expression, are completely new data for the target data set, and the data processing system can be direct
By data insertion, addition or the write-in in first data acquisition system into the target data set, thus to described
Data in target data set are updated.
Optionally, step 207 includes:
Determined from the second data set the second keyword perhaps critical field using second keyword or
Critical field is inquired in first data acquisition system, is closed if not inquiring described second in first data acquisition system
Key word critical field and does not inquire and second keyword or critical field perhaps in first data acquisition system
The data to match update the data to match in the second data set with second keyword or critical field
To in the target data set;By what is matched in first data acquisition system with first keyword or critical field
Data are updated in the target data set.
In the step, after the data processing system empties the data in the target data set, the data
Processing system can be according to the data for including in the second data set, to determine the second keyword or critical field, so
It is inquired in first data acquisition system using second keyword or critical field afterwards, to inquire first number
According to whether there is the data to match with second keyword or critical field in set, if the data processing system is logical
Inquiry is crossed, determines and does not inquire second keyword or critical field in first data acquisition system, also, is determined
It is not inquired in first data acquisition system with second keyword or the matched data of critical field, the data
Processing system can consider in the second data set and be not required to second keyword or the matched data of critical field
Update, so, the data processing system can by the second data set with second keyword or critical field
The data to match are updated in the target data set after emptying, then further according to first keyword or key
Field extracts the determining data to match with first keyword or critical field from first data acquisition system
Out, and by the data extracted it is updated in the target data set after emptying, to complete to the target data set
The data of conjunction update.
Wherein, the data to match in the second data set with second keyword or critical field are updated
To in the target data set, and by first data acquisition system with first keyword or critical field phase
The data matched are updated in the target data set, can be by data being passed through insertion, adds or writes after inquiry
The modes such as enter, is updated in the target data set.
For example, please referring to Fig. 3 indicates that the data before not updating in target data set are indicated into Fig. 5, such as Fig. 3
Service information list, be represented in Fig. 4 the information table that data in the first data acquisition system indicate, indicate the second data set in Fig. 5
In the information table that indicates of data, Fig. 6 and Fig. 7 indicate the mistake that the information indicated the data in target data set is updated
Journey schematic diagram indicates the service information list that the data in updated target data set indicate in Fig. 8.Such as the institute before not updating
Stating the data in target data set indicates the credit balance information of Zhang San and Li Si, the tables of data in first data acquisition system
Show the related deposit business information that personnel Zhang San of business handling, king five, Zhao six etc. are carried out in the past period, described the
Data in two data acquisition systems indicate all business information of same personnel in the target data set, i.e. Zhang San and Li Si
Credit balance information.
It is so when carrying out data update to the target data set, the data in the target data set are clear
Sky after emptying the information in the tables of data in Fig. 3, obtains blank letter shown in Fig. 6 of the target data expression of blank
Cease table;Then the data processing system can determine the second keyword or critical field from the second data set
(such as ID of the ID of Zhang San perhaps Li Si) is then according to second keyword or critical field in first data set
It is inquired in conjunction, inquires whether to have in first data acquisition system and match with second keyword or critical field
Data the data for whether having the related service information for indicating Zhang San or Li Si are inquired, such as in first data acquisition system
Fruit does not inquire the data to match with second keyword or critical field in first data acquisition system, as in institute
The data for not inquiring in the first data acquisition system and matching with the keyword of Li Si or critical field are stated, mean that this data
It updates, the business datum of Li Si does not need to update, then can will be crucial with described second in the second data set
The data that word or critical field match, i.e., business datum relevant to Li Si, are updated to the target data after emptying
In set, to complete first step update, the related service information table of Li Si shown in Fig. 7 is obtained, whereas if described the
The data to match with second keyword or critical field are inquired in one data acquisition system, such as in first data set
The data to match with the keyword or critical field for indicating Zhang San are inquired in conjunction, mean that this data updates, have and open
Three business datum needs are updated, and are just not required to for the business datum of Zhang San in the second data set to be added to the institute after emptying
It states in target data set, i.e., will not match with second keyword or critical field in the second data set
Data are added to the target data set after emptying;Then, the data processing system can be according to first data
The first keyword or critical field in set, the first keyword such as relevant to the business datum of Zhang San, king five and Zhao six
Or critical field (such as ID of Zhang San, king five and Zhao six), by first data acquisition system with first keyword or
The data that person's critical field (such as ID of Zhang San, king five and Zhao six) matches are added directly to the target data after emptying
In set, to complete to update the data of the target data set, thus obtain Fig. 8 shows the updated target
The service information list that data in data acquisition system indicate.
Optionally, when the target data set is combined into zipper data acquisition system, step 207 includes:
It is inquired in first data acquisition system using second keyword or critical field;If described
Second keyword or critical field are not inquired in one data acquisition system, and is not inquired in first data acquisition system
It, will be crucial with described second in the second data set to the data to match with second keyword or critical field
The data that word or critical field match are updated to the target data set;Determine in the second data set with it is described
The first zipper data that first keyword or critical field match are modified and are in open chain state in the first zipper data
Closed chain time of the first sub- zipper data be the time for generating the second data set, and be based on first data acquisition system
In the data that match with first keyword or critical field, generate the second sub- zipper number of the first zipper data
According to, wherein the open chain time of the second sub- zipper data is the time for generating the second data set, and the closed chain time is sky
Or maximum;If not inquiring the number to match with first keyword or critical field in the second data set
According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field
Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky
Or maximum;Modified first zipper data, the second zipper data are updated in the target data set.
In the step, if the target data set is combined into zipper data acquisition system, i.e., the number in the described target data set
According to for zipper data, after the data processing system empties the data in the target data set, the data
Processing system can be according to the data for including in the second data set, to determine the second keyword or critical field, so
It is inquired in first data acquisition system using second keyword or critical field afterwards, to inquire first number
According to whether there is the data to match with second keyword or critical field in set, if the data processing system is logical
Inquiry is crossed, determines and does not inquire second keyword or critical field in first data acquisition system, also, is determined
It is not inquired in first data acquisition system with second keyword or the matched data of critical field, the data
Processing in the second data set with second keyword or the matched data of critical field it is considered that do not need more
Newly, so, the data processing system can by the second data set with second keyword or critical field phase
The data matched are updated in the target data set after emptying.
Then, first keyword or critical field can be used in second data in the data processing system
It is inquired in set, if there is the data to match with first keyword or critical field in the second data set
Words, the data processing system can be in the second data sets, determining and first keyword or critical field
The the first zipper data to match, then the data processing system can modify to the first zipper data, thus
It is set as the closed chain time for being in the first sub- zipper data of open chain state in the first zipper data to generate described second
The time of data acquisition system, further, when the data processing system can reset the open chain of the first zipper data
Between, that is, the sub- zipper data of second in open chain state of the first zipper data are generated, specifically, the data processing system
It unites the data to be matched in available first data acquisition system with first keyword or critical field, and according to institute
The data to match in the first data acquisition system with first keyword or critical field are stated, to generate the first zipper number
According to the second sub- zipper data, the open chain time of the second sub- zipper data is the time for generating the second data set,
The closed chain time is empty or maximum, and expression is in open chain state up to now.
If the data processing system do not inquired in the second data set with first keyword or
If the data that critical field matches, then just illustrate in first data acquisition system with first keyword or key
The data that field matches all are new data, the data processing system can according in first data acquisition system with institute
The first keyword is stated or data that critical field matches, to generate new zipper data, i.e. the second zipper data, wherein
The open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or maximum.
Finally, the data processing system is by modified first zipper data and newly-generated the second zipper number
According to being updated in the target data set, complete to update the data of the target data set.
For example, the zipper number before not updating in target data set is indicated please refer to Fig. 9 to Figure 11, in Fig. 9
Indicate the information table that the zipper data in the second data set indicate according to the service information list of expression, in Figure 10, Figure 11 indicates the
The information table that data in one data acquisition system indicate, Figure 12 and Figure 13 indicate the information indicated the data in target data set
The process schematic being updated indicates the service information list that the data in updated target data set indicate in Figure 14.
It is described if the zipper data in the target data set before not updating indicate the balance of deposits managing detailed catalogue of Zhang San and Li Si
Data in first data acquisition system indicate to carry out personnel Zhang San, king five, Zhao six of business handling etc. in the past period
Related deposit business information, the data in the second data set indicate and identical deposit personnel in the target data set
All business information, i.e., the data in the described the second data set indicate the managing detailed catalogue of the balance of deposits of Zhang San and Li Si.
It is so when carrying out data update to the target data set, the data in the target data set are clear
Sky after emptying the information in the tables of data in Fig. 9, obtains blank letter shown in Figure 12 of the target data expression of blank
Cease table;Then the data processing system can determine the second keyword or critical field from the second data set
(such as ID of the ID of Zhang San perhaps Li Si) is then according to second keyword or critical field in first data set
It is inquired in conjunction, inquires whether to have in first data acquisition system and match with second keyword or critical field
Data the data for whether having the related service information for indicating Zhang San or Li Si are inquired, such as in first data acquisition system
Fruit does not inquire the data to match with second keyword or critical field in first data acquisition system, as in institute
The data for not inquiring in the first data acquisition system and matching with the keyword of Li Si or critical field are stated, mean that this data
It updates, the business datum of Li Si does not need to update, then can will be crucial with described second in the second data set
The data that word or critical field match, i.e., business datum relevant to Li Si, are updated to the target data after emptying
In set, to complete first step update, the detail list of the related service information of Li Si shown in Figure 13 is obtained;Then, it uses
First keyword or critical field are inquired in the second data set, if in the second data set
In inquire the data to match with first keyword or critical field, such as inquired in the second data set
The data to match with the keyword or critical field for indicating Zhang San mean that this data updates, there is the business number of Zhang San
According to needing to be updated, then, the data processing system can be crucial according to described first in the second data set
Word or critical field determine the data for indicating the related service information of Zhang San, i.e., the first of the deposit information detail of expression Zhang San
Then the closed chain time modification of the first sub- zipper data that open chain state is in the first zipper data is by zipper data
The time of the second data set is generated, i.e., the time that data update (carries out data update to the target data set
Time), and according to the data to match in first data acquisition system with first keyword or critical field, that is, it indicates
The data of the new business information of Zhang San, come generate a new expression Zhang San deposit information detail zipper data, i.e., second
Sub- zipper data, the open chain time that the second sub- zipper data are arranged is the time for generating the second data set, i.e. data
Renewal time, closed chain time are empty or maximum;Then by the second data set with first keyword or pass
The data that key field matches, i.e., business datum relevant to Zhang San are updated in the target data set after emptying, from
And complete second step update;, whereas if not inquired in the second data set and first keyword or pass
The data that key field matches do not inquire the data for indicating the relevant information of king five and Zhao six such as, then the data processing system
System can according in first data acquisition system with first keyword or critical field, i.e., in described first data acquisition system
Indicate king five and Zhao six relevant data, come generate in first data acquisition system with first keyword or keyword
The second zipper data of data that section matches, come indicate king five and Zhao six related service information detail list, and can be with
The open chain time that the second zipper data are arranged is data renewal time, and the closed chain time is empty or maximum;It then will be described
First zipper data, the i.e. data of the related service of expression Zhang San, and the second zipper data generated, i.e. expression king five and Zhao
Six relevant business datum is updated in the target data set after emptying, to complete to the target data set
Data update, thus obtain Figure 14 expression the updated target data set in data indicate business information
Table.
Optionally, after step 201, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation
Restore the data in the target data set according to the data in set.
In the step, data are completed in the target data set and are updated or in target data set progress data
When update, the data processing system can update the data of the target data set to be monitored in real time, if prison
It measures and occurs updating mistake when carrying out data update to the target data set, i.e., go out in step 206 and/or step 207
When now updating the situation of mistake, the data processing system can carry out data recovery to the target data set, specifically,
The available the second data set generated in step 205 of the data processing system, then using the second data generated
Data in set restore the data in the target data set.
After restoring the data in the target data set, the data processing system, which can control, stops data more
Newly.
Here it is possible to directly use the second data set, i.e., the time-domain snapshot data set of target data set carries out
Data are restored, simple and fast, the opposite data refresh mode shorter suitable for the data update cycle.
Alternatively, updating mistake if detecting and occurring data when carrying out data update to the target data set, obtain pre-
The Backup Data set first backed up restores the number in the target data set using the data in the Backup Data set
According to.
In the step, data are completed in the target data set and are updated or in target data set progress data
When update, the data processing system can update the data of the target data set to be monitored in real time, if prison
It measures and occurs updating mistake when carrying out data update to the target data set, the data processing system can be to the mesh
It marks data acquisition system and carries out data recovery, specifically, the available preparatory backup data set backed up of the data processing system
It closes, data recovery then is carried out to the target data set using the data in the Backup Data set.
Wherein, the backup cycle of the Backup Data set can be the setting for carrying out backup cycle as needed, such as standby
1 month data volume of part.
Wherein, the Backup Data set can be the data saved in first data acquisition system, and to described second
Data acquisition system, i.e. data in time-domain snapshot data set back up a full dose data according to default backup cycle.
After restoring the data in the target data set, the data processing system, which can control, stops data more
Newly.
Here, using back mechanism, that is, data how long is backed up and just restore data how long, such as have been backed up one month
Data just restore one month data, simple and fast, opposite to be suitable for data update cycle longer data refresh mode.
In present embodiment, monitoring that occurring data when carrying out data update to the target data set updates mistake
When, above two mode rollback can be used to carry out data recovery, however, it is not limited to this, in other embodiments,
Data can be ignored and update false alarm, continue data update, can also be after rollback recovery data, re-start
Data update.
Optionally, step 205 includes:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated
All data stored, or obtain receive first data acquisition system after, in this target data set to be updated
Data, back up all data stored in the target data set that has been updated or this target data set to be updated
Data in conjunction are to generate the second data set.
For generating the second data set, can be by way of backup, the mode of usage history data backup
Generate the second data set.
Therefore, in this step, after the data processing system receives first data acquisition system, at the data
Reason system can detect historical data, using after receiving first data acquisition system, this target to be updated
The mode that data acquisition system is backed up generates the second data set;First data are received at this alternatively, obtaining
In preset time period before set, all data stored in updated target data set, thus will more
All data backups stored in the target data set newly crossed are into a set, to generate second data set
It closes.
Alternatively, the second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target
Data in data acquisition system are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this
The secondary the second data set.
For generating the second data set, can be by way of being updated to available data insertion, in conjunction with existing
Historical data generate the second data set.
Therefore, in the step, after this described data processing system receives first data acquisition system, the data
Processing system is available before this receives first data acquisition system, when the last time receives the first data acquisition system
Then the second data set of generation, then obtains the data in the target data set, and will be in the target data set
Data be inserted into the last the second data set generated when receiving the first data acquisition system, thus the institute to generate this
State the second data set.
Data processing method provided in an embodiment of the present invention receives the first data acquisition system of external system transmission;From described
The first keyword or critical field are determined in first data acquisition system;Using first keyword or critical field described
It is inquired in target data set;If inquiring first keyword or keyword in the target data set
Section either inquires the data to match with first keyword or critical field, executes described in data processing system
The step of middle generation the second data set associated with target data set to be updated;In a data processing system generate with
The associated the second data set of target data set to be updated;Empty the data in the target data set;Using institute
The data stated in the first data acquisition system and the second data set carry out data update to the target data set.In this way,
Using the data in the first data acquisition system and the second data set to the number in target data set by way of inquiry insertion
According to being updated, without being scanned to all data and node, the update of data in target data set can be completed, it can be with
The plenty of time of scan full hard disk is saved, and then the workload of data processing system is effectively reduced, improves the efficiency that data update.
Wherein, the embodiment of the method for Fig. 1 to Fig. 2 can be used for data processing system, and data processing system can be realized
Each process in the embodiment of the method for Fig. 1 to Fig. 2.
It is the structure chart for the data processing system that one embodiment of the invention provides, Tu16Wei referring to Figure 15 to Figure 21, Figure 15
One of the structure chart of the data processing engine module of data processing system shown in Figure 15, Figure 17 are data processing shown in Figure 15
Two, Figure 18 of the structure chart of the data processing engine module of system is the data processing engine of data processing system shown in Figure 15
The four of the structure chart of the data processing engine module of data processing system shown in three, Figure 19 Figure 15 of the structure chart of module, figure
20 be one of the structure chart of the first updating unit shown in Figure 16, and Figure 21 is the structure of the first updating unit shown in Figure 16
The two of figure.As shown in figure 15, data processing system 1500 includes data memory module 1510, business logic modules 1520, data
Service module 1530 and data processing engine modules 1540.
The data processing system 1500 can be a kind of data engineering platform (Data Engineering Platform,
DEP)。
Wherein, the data memory module 1510 is used to store the internal data of the data processing system 1500, and
The data obtained from outside.
The data memory module 1510 can be distributed document storage (Hadoop Distributed File
System, HDFS) system.HDFS system is accumulation layer, and for storing the internal data of DEP, and storage DEP is from external system
The data of acquisition.DEP obtains data from external system, can be and directly extracts in data, such as system R DB2
Data, the data in database Cloud Server Oracle ExaData, the data of Excel format, can also be document form
Data are sent to the data of DEP, such as the data of textual form with document form, further include unstructured data, such as
Log log, audio/video multimedia file.
Wherein, the business logic modules 1520 are for managing service logic.The business logic modules 1520 can wrap
The storage unit for storing the service logic of the data processing system is included, the service logic includes at least one following: scheduling
Rule, data genetic connection, model metadata and wscript.exe (such as automation tools) etc..
Wherein, the data service module 1530 is used to provide data service to the external system of data processing system,
Include:
Push unit 1531 is used for the queue of external system pushed information and data, such as PUSH message queue, push number
According to database.
Unit 1532 is achieved, storage file form data are used for.
Data transmission interface (Representational State Transfer API, Rest API) unit 1533 is used
In with the down-stream system of data processing system perhaps service system connect by the interface unit be the down-stream system or
Service system provides data, such as reporting system, Analysis Service etc..
The data processing engine module 1540 can be structured query language for handling data
(Structured Query Language) engine modules, abbreviation SQL engine modules, SQL engine modules can by Hive and/
Or the engines such as Spark are constituted.
Optionally, the data processing system 1500 further include:
Information exchange module 1550 carries out pipe to the data processing system for receiving the operational order of user's input
Reason and setting.User may include business personnel (personnel on service line), operation maintenance personnel (personnel on technology line) etc., Yong Hujiao
Corresponding UI user interface can be set in mutual module.
Optionally, the data processing system 1500 further includes automation tools module, can be rule-based (such as logical
Cross the method that the data processing method carries out data update) write automation tools (i.e. one section of program), it is only necessary to understand in DEP
In which data acquisition system need by zipper method record variation history, the algorithm routine can be realized by the automation tools
Automation generate, such as SQL statement is generated in Hive.
Wherein, the automation tools module may include:
Parameter receiving unit, parameter for receiving input.
Script generation unit generates automation tools script for being based on preset rules and the parameter.
Specifically, the parameter receiving unit, the parameter of the input data processing system for receiving user, can be root
According to the instruction write-in received parameter corresponding with described instruction.The parameter includes at least one following: the name of data acquisition system
Title, field, data type.
For example, if wondering certain customer banking account remaining sum situation of change, i.e., it should be understood that the remaining sum (mesh of client
The balance information table indicated in mark data acquisition system) and revenue and expenditure detail (the balance detail information table indicated in target data set),
The carry out data update for generating and realizing in above-described embodiment can be automated by the corresponding automation tools of automation module
Method (connection table inquiry compares and insertion algorithm) correlative code, real dynamic inquiry.It needs to run based on business, it is flat in Hadoop
Platform record data variation history operation, specifically can by Hadoop platform data variation historical record into HDFS.
Wherein, as shown in figure 16, the data processing engine module 1540 includes:
Receiving unit 1541, for receiving the first data acquisition system of external system transmission.
Generation unit 1542, for generating associated with target data set to be updated in a data processing system
Two data acquisition systems.
Clearing cell 1543, for emptying the data in the target data set.
First updating unit 1544, for using the data pair in first data acquisition system and the second data set
The target data set carries out data update.
Wherein, the first data acquisition system of the receiving unit 1541 received external system transmission, can be directly from outer
Portion's system receives first data acquisition system, is also possible to be stored in the number by the first data acquisition system that external system is transmitted
After memory module 1510, first data acquisition system is obtained from the data memory module 1510.
Optionally, as shown in figure 17, the data processing engine module 1540 further include:
First determination unit 1545, for determining the first keyword or critical field from first data acquisition system.
Query unit 1546, for using first keyword or critical field in the target data set into
Row inquiry.
Execution unit 1547, if for inquiring first keyword or keyword in the target data set
Section either inquires the data to match with first keyword or critical field, executes described in data processing system
The step of middle generation the second data set associated with target data set to be updated.
Optionally, as shown in figure 17, the data processing engine module 1540 further include:
Second updating unit 1548, if for do not inquired in the target data set first keyword or
Critical field, and the data to match with first keyword or critical field are not inquired, by first data
The data of set are updated in the target data set.
Optionally, as shown in figure 18, the data processing engine module 1540 further include:
First recovery unit 1549 updates when if carrying out data update to the target data set for detecting
Mistake restores the data in the target data set using the data in the second data set of generation.
Alternatively, as shown in figure 19, the data processing engine module 1540 includes:
There is number when if carrying out data update to the target data set for detecting in second recovery unit 15410
According to mistake is updated, the Backup Data set backed up in advance is obtained, restores the mesh using the data in the Backup Data set
Mark the data in data acquisition system.
Optionally, as shown in figure 20, first updating unit 1544 includes:
First determines subelement 15441, for determining the second keyword or keyword from the second data set
Section.
First inquiry subelement 15442, for using second keyword or critical field in first data
It is inquired in set.
First updates subelement 15443, if for not inquiring second keyword in first data acquisition system
Perhaps it critical field and is not inquired in first data acquisition system and second keyword or critical field phase
The data to match in the second data set with second keyword or critical field are updated to institute by the data matched
It states in target data set.
Second update subelement 15444, for by first data acquisition system with first keyword or key
The data that field matches are updated in the target data set.
Optionally, as shown in figure 21, when the target data set is combined into zipper data acquisition system, first updating unit
1544 include:
Second determines subelement 15445, for determining the second keyword or keyword from the second data set
Section.
Second inquiry subelement 15446, for using second keyword or critical field in first data
It is inquired in set.
Third updates subelement 15447, if for not inquiring second keyword in first data acquisition system
Perhaps it critical field and is not inquired in first data acquisition system and second keyword or critical field phase
The data to match in the second data set with second keyword or critical field are updated to institute by the data matched
It states in target data set.
Third determines subelement 15448, for determine in the second data set with first keyword or pass
The zipper data that key field matches.
Subelement 15449 is modified, for modifying the first sub- zipper number for being in open chain state in the first zipper data
According to the closed chain time be the time for generating the second data set, and based on being closed with described first in first data acquisition system
The data that key word or critical field match generate the second sub- zipper data of the first zipper data, wherein described the
The open chain time of two sub- zipper data is the time for generating the second data set, and the closed chain time is empty or maximum.
Generate subelement 154410, if for do not inquired in the second data set with first keyword or
The data that person's critical field matches are based in first data acquisition system and first keyword or critical field phase
The data matched generate the second zipper data, wherein the open chain time of the second zipper data is to generate second data set
The time of conjunction, closed chain time are empty or maximum.
4th updates subelement 154411, for by modified zipper data and first data acquisition system with first
The data that keyword or critical field match are updated in the target data set.
Optionally, the generation unit 1542 is also used to obtain the preset time received before first data acquisition system
In section, all data stored in the target data set that has been updated, or acquisition receive first data acquisition system
Afterwards, the data in this target data set to be updated back up owning of storing in the target data set that has been updated
Data in data or this target data set to be updated are to generate the second data set.
Alternatively, the generation unit 1542 is also used to obtain last the second number generated when receiving the first data acquisition system
According to set, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate the
In two data acquisition systems, to generate this second data set.
Data processing system 1500 provided in an embodiment of the present invention can be realized data in the embodiment of the method for Fig. 1 to Fig. 2
Each process that processing system is realized, to avoid repeating, which is not described herein again.
Data processing system provided in an embodiment of the present invention is needed in the first data acquisition system for receiving external system transmission
When carrying out data update, connection table inquiry mode can be used by the means of inquiry insertion and carry out data update, to guarantee number
The plenty of time is saved, and improve the efficiency of data update without being scanned to all data according to the stability of processing system.
The embodiment of the present invention also provides a kind of data processing system, and the data processing system includes: receiving module, generates
Module removes module and the first update module, in which:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second number associated with target data set to be updated in a data processing system
According to set;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to described
Target data set carries out data update.
Optionally, the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being looked into the target data set using first keyword or critical field
It askes;
Execution module, if for inquiring first keyword or critical field in the target data set,
The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system
The step of at the second data set associated with target data set to be updated.
Optionally, the data processing system further include:
Second update module, if for not inquiring first keyword or key in the target data set
Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system
Data be updated in the target data set.
Optionally, first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for using second keyword or critical field in first data acquisition system
It is inquired;
First updates submodule, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Second update submodule, for by first data acquisition system with first keyword or critical field phase
Matched data are updated in the target data set.
Optionally, when the target data set is combined into zipper data acquisition system, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for using second keyword or critical field in first data acquisition system
It is inquired;
Third updates submodule, if for not inquiring second keyword or pass in first data acquisition system
Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system
According to the data to match in the second data set with second keyword or critical field are updated to the target
In data acquisition system;
Third determines submodule, for determine in the second data set with first keyword or critical field
The the first zipper data to match;
Submodule is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state
The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or
The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn
The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or key
The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field
According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when
Between, the closed chain time is empty or maximum;
4th updates submodule, described for being updated to modified first zipper data and the second zipper data
In target data set.
Optionally, the data processing system further include:
First recovery module occurs updating when if carrying out data update to the target data set for detecting wrong
Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery module
Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set
Data in set.
Optionally, the generation module, be specifically also used to obtain receive first data acquisition system before it is default when
Between in section, all data stored in the target data set that has been updated, or obtain and receive first data set
After conjunction, data in this target data set to be updated back up the institute stored in the target data set being updated
There are the data in data or this target data set to be updated to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last second generated when receiving the first data acquisition system
Data acquisition system, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate
In the second data set, to generate this second data set.
Data processing system provided in an embodiment of the present invention can be realized data processing in the embodiment of the method for Fig. 1 to Fig. 2
Each process that system is realized, to avoid repeating, which is not described herein again.
Data processing system provided in an embodiment of the present invention is needed in the first data acquisition system for receiving external system transmission
When carrying out data update, connection table inquiry mode can be used by the means of inquiry insertion and carry out data update, to guarantee number
The plenty of time is saved, and improve the efficiency of data update without being scanned to all data according to the stability of processing system.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form belongs within protection of the invention.
Claims (38)
1. a kind of data processing method, which is characterized in that be applied to data processing system, which comprises
Store the internal data of the data processing system, and the data obtained from outside;
Manage service logic;
Data service is provided to the external system of data processing system;
Data are handled;
Wherein, described the step of data are handled, comprising:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
Data are carried out to the target data set using the data in first data acquisition system and the second data set
It updates;
Wherein, the data using in first data acquisition system and the second data set are to the target data set
Carry out data update, comprising:
The data in first data acquisition system and the second data set are inserted into using the mode of inquiry insertion described
In target data set, to be updated to the data in the target data set.
2. the method as described in claim 1, which is characterized in that the method also includes:
The operational order for receiving user's input, is managed and is arranged to the data processing system.
3. the method as described in claim 1, which is characterized in that the internal data of the storage data processing system, with
And from outside obtain data the step of, comprising:
Storing from the data that outside obtains includes direct extraction-type data and document form data.
4. the method as described in claim 1, which is characterized in that the step of the control service logic, comprising:
The service logic of the data processing system is stored, the service logic includes at least one following: scheduling rule, data
Genetic connection, model metadata and wscript.exe.
5. the method as described in claim 1, which is characterized in that described to provide data clothes to the external system of data processing system
The step of business, comprising:
The queue of external system pushed information and data to data processing system;
Storage file form data;
Perhaps service system connect and provides number for the down-stream system or service system with the down-stream system of data processing system
According to.
6. the method as described in claim 1, which is characterized in that the method also includes:
Receive the parameter of input;
Based on preset rules and the parameter, automation tools script is generated.
7. such as method of any of claims 1-6, which is characterized in that it is described generate in a data processing system with
Before the step of target data set to be updated associated the second data set, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and described if inquiring first keyword in the target data set
The data that one keyword or critical field match execute described generate in a data processing system and number of targets to be updated
According to the step of gathering associated the second data set.
8. the method for claim 7, which is characterized in that existed described using first keyword or critical field
After the step of being inquired in the target data set, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire and institute
It states the first keyword or data that critical field matches, the data of first data acquisition system is updated to the number of targets
According in set.
9. the method for claim 7, which is characterized in that described to use first data acquisition system and second data
The step of data in set carry out data update to the target data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number
It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set
It is updated in the target data set with the data that second keyword or critical field match;
The data to match in first data acquisition system with first keyword or critical field are updated to the mesh
It marks in data acquisition system.
10. the method for claim 7, which is characterized in that when the target data set is combined into zipper data acquisition system, institute
It states and data is carried out more to the target data set using the data in first data acquisition system and the second data set
New step, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number
It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set
It is updated in the target data set with the data that second keyword or critical field match;
Determine the first zipper data to match in the second data set with first keyword or critical field;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate described the
The time of two data acquisition systems, and based on matching with first keyword or critical field in first data acquisition system
Data generate the second sub- zipper data of the first zipper data, wherein the open chain time of the second sub- zipper data is
The time of the second data set is generated, the closed chain time is empty or maximum;
If not inquiring the data to match with first keyword or critical field, base in the second data set
The data to match in first data acquisition system with first keyword or critical field generate the second zipper data,
Wherein, the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or pole
Big value;
Modified first zipper data and the second zipper data are updated in the target data set.
11. such as method of any of claims 1-6, which is characterized in that empty the target data set described
In data the step of after, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second data set of generation
Data in conjunction restore the data in the target data set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, acquisition is backed up standby in advance
Part data acquisition system, restores the data in the target data set using the data in the Backup Data set.
12. such as method of any of claims 1-6, which is characterized in that it is described generate in a data processing system with
The step of target data set to be updated associated the second data set, comprising:
It obtains in the preset time period before receiving first data acquisition system, is stored in the target data set being updated
All data crossed, or obtain receive first data acquisition system after, the number in this target data set to be updated
According in all data or this target data set to be updated stored in the target data set that backup has been updated
Data to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, will be in presently described target data set
Data be inserted into the last the second data set generated when receiving the first data acquisition system, to generate this described the
Two data acquisition systems.
13. a kind of data processing method, which is characterized in that the described method includes:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
Data are carried out to the target data set using the data in first data acquisition system and the second data set
It updates;
Wherein, the data using in first data acquisition system and the second data set are to the target data set
Carry out data update, comprising:
The data in first data acquisition system and the second data set are inserted into using the mode of inquiry insertion described
In target data set, to be updated to the data in the target data set.
14. method as claimed in claim 13, which is characterized in that it is described in a data processing system generate with it is to be updated
Before the step of target data set associated the second data set, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and described if inquiring first keyword in the target data set
The data that one keyword or critical field match execute described generate in a data processing system and number of targets to be updated
According to the step of gathering associated the second data set.
15. method as claimed in claim 14, which is characterized in that use first keyword or critical field described
After the step of being inquired in the target data set, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire and institute
It states the first keyword or data that critical field matches, the data of first data acquisition system is updated to the number of targets
According in set.
16. method as claimed in claim 14, which is characterized in that described to be counted using first data acquisition system with described second
The step of data update is carried out to the target data set according to the data in set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number
It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set
It is updated in the target data set with the data that second keyword or critical field match;
The data to match in first data acquisition system with first keyword or critical field are updated to the mesh
It marks in data acquisition system.
17. method as claimed in claim 14, which is characterized in that when the target data set is combined into zipper data acquisition system,
The data using in first data acquisition system and the second data set carry out data to the target data set
The step of update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number
It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set
It is updated in the target data set with the data that second keyword or critical field match;
Determine the first zipper data to match in the second data set with first keyword or critical field;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate described the
The time of two data acquisition systems, and based on matching with first keyword or critical field in first data acquisition system
Data generate the second sub- zipper data of the first zipper data, wherein the open chain time of the second sub- zipper data is
The time of the second data set is generated, the closed chain time is empty or maximum;
If not inquiring the data to match with first keyword or critical field, base in the second data set
The data to match in first data acquisition system with first keyword or critical field generate the second zipper data,
Wherein, the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or pole
Big value;
Modified first zipper data and the second zipper data are updated in the target data set.
18. method as claimed in claim 13, which is characterized in that empty data in the target data set described
After step, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second data set of generation
Data in conjunction restore the data in the target data set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, acquisition is backed up standby in advance
Part data acquisition system, restores the data in the target data set using the data in the Backup Data set.
19. method as claimed in claim 13, which is characterized in that described to generate in a data processing system and mesh to be updated
The step of marking data acquisition system associated the second data set, comprising:
It obtains in the preset time period before receiving first data acquisition system, is stored in the target data set being updated
All data crossed, or obtain receive first data acquisition system after, the number in this target data set to be updated
According in all data or this target data set to be updated stored in the target data set that backup has been updated
Data to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, will be in presently described target data set
Data be inserted into the last the second data set generated when receiving the first data acquisition system, to generate this described the
Two data acquisition systems.
20. a kind of data processing system, which is characterized in that the data processing system includes:
Data memory module, for storing the internal data of the data processing system, and the data obtained from outside;
Business logic modules, for managing service logic;
Data service module, for providing data service to the external system of data processing system;
Data processing engine module, for handling data;
Wherein, the data processing engine module includes:
Receiving unit, for receiving the first data acquisition system of external system transmission;
Generation unit, for generating second data set associated with target data set to be updated in a data processing system
It closes;
Clearing cell, for emptying the data in the target data set;
First updating unit, for using the data in first data acquisition system and the second data set to the target
Data acquisition system carries out data update;
Wherein, first updating unit is also used for the mode of inquiry insertion for first data acquisition system and described the
Data in two data acquisition systems are inserted into the target data set, to carry out more to the data in the target data set
Newly.
21. data processing system as claimed in claim 20, which is characterized in that the data processing system includes:
Information exchange module is managed and is arranged to the data processing system for receiving the operational order of user's input.
22. data processing system as claimed in claim 20, which is characterized in that the data memory module is distributed document
The data of storage system, data memory module storage from the outside acquisition include direct extraction-type data and document form number
According to.
23. data processing system as claimed in claim 20, which is characterized in that the business logic modules include:
Storage unit, for storing the service logic of the data processing system, the service logic includes at least one following:
Scheduling rule, data genetic connection, model metadata and wscript.exe.
24. data processing system as claimed in claim 20, which is characterized in that the data service module includes:
Push unit, for the queue of external system pushed information and data to data processing system;
Unit is achieved, storage file form data are used for;
Data transmission interface unit is connect for connecting with the down-stream system of data processing system or service system by described
Mouth unit provides data for the down-stream system or service system.
25. data processing system as claimed in claim 20, which is characterized in that the data processing system further includes automation
Tool model, the automation tools module include:
Parameter receiving unit, parameter for receiving input;
Script generation unit generates automation tools script for being based on preset rules and the parameter.
26. the data processing system as described in any one of claim 20-25, which is characterized in that the data processing engine
Module further include:
First determination unit, for determining the first keyword or critical field from first data acquisition system;
Query unit, for being inquired in the target data set using first keyword or critical field;
Execution unit, if for inquired in the target data set first keyword perhaps critical field or
Inquire the data to match with first keyword or critical field, execute it is described generate in a data processing system with
The step of target data set to be updated associated the second data set.
27. data processing system as claimed in claim 26, which is characterized in that the data processing engine module further include:
Second updating unit, if for not inquiring first keyword or keyword in the target data set
Section, and the data to match with first keyword or critical field are not inquired, by first data acquisition system
Data are updated in the target data set.
28. data processing system as claimed in claim 26, which is characterized in that first updating unit includes:
First determines subelement, for determining the second keyword or critical field from the second data set;
First inquiry subelement, for being carried out in first data acquisition system using second keyword or critical field
Inquiry;
First updates subelement, if for not inquiring second keyword or keyword in first data acquisition system
Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system,
The data to match in the second data set with second keyword or critical field are updated to the number of targets
According in set;
Second updates subelement, for will match in first data acquisition system with first keyword or critical field
Data be updated in the target data set.
29. data processing system as claimed in claim 26, which is characterized in that when the target data set is combined into zipper data
When set, first updating unit includes:
Second determines subelement, for determining the second keyword or critical field from the second data set;
Second inquiry subelement, for being carried out in first data acquisition system using second keyword or critical field
Inquiry;
Third updates subelement, if for not inquiring second keyword or keyword in first data acquisition system
Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system,
The data to match in the second data set with second keyword or critical field are updated to the number of targets
According in set;
Third determines subelement, determines and matches with first keyword or critical field in the second data set
First zipper data;
Subelement is modified, when for modifying the closed chain of the first sub- zipper data in the first zipper data in open chain state
Between for time for generating the second data set, and based in first data acquisition system and first keyword or pass
The data that key field matches generate the second sub- zipper data of the first zipper data, wherein the second sub- zipper number
According to the open chain time be the time for generating the second data set, the closed chain time is empty or maximum;
Subelement is generated, if for not inquiring in the second data set and first keyword or critical field
The data to match, it is raw based on the data to match in first data acquisition system with first keyword or critical field
At the second zipper data, wherein the open chain time of the second zipper data is the time for generating the second data set, is closed
The chain time is empty or maximum;
4th updates subelement, for modified first zipper data and the second zipper data to be updated to the target
In data acquisition system.
30. the data processing system as described in any one of claim 20-25, which is characterized in that the data processing engine
Module includes:
First recovery unit occurs updating mistake when if carrying out data update to the target data set for detecting, make
Restore the data in the target data set with the data in the second data set of generation;Or
There are data and updates mistake in second recovery unit when if carrying out data update to the target data set for detecting
Accidentally, the Backup Data set backed up in advance is obtained, restores the target data set using the data in the Backup Data set
Data in conjunction.
31. the data processing system as described in any one of claim 20-25, which is characterized in that
The generation unit is also used to obtain in the preset time period before receiving first data acquisition system, has been updated
All data stored in target data set, or obtain receive first data acquisition system after, this is to be updated
Data in target data set, back up all data stored in the target data set being updated or this is waited for more
Data in new target data set are to generate the second data set;
Alternatively, the generation unit is also used to obtain the last the second data set generated when receiving the first data acquisition system,
By the data in presently described target data set be inserted into it is last receive the first data acquisition system when the second data for generating
In set, to generate this second data set.
32. a kind of data processing system, which is characterized in that the data processing system includes:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second data set associated with target data set to be updated in a data processing system
It closes;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to the target
Data acquisition system carries out data update;
Wherein, first update module is also used for the mode of inquiry insertion for first data acquisition system and described the
Data in two data acquisition systems are inserted into the target data set, to carry out more to the data in the target data set
Newly.
33. data processing system as claimed in claim 32, which is characterized in that the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being inquired in the target data set using first keyword or critical field;
Execution module, if for inquired in the target data set first keyword perhaps critical field or
Inquire the data to match with first keyword or critical field, execute it is described generate in a data processing system with
The step of target data set to be updated associated the second data set.
34. data processing system as claimed in claim 33, which is characterized in that the data processing system further include:
Second update module, if for not inquiring first keyword or keyword in the target data set
Section, and the data to match with first keyword or critical field are not inquired, by first data acquisition system
Data are updated in the target data set.
35. data processing system as claimed in claim 33, which is characterized in that first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for being carried out in first data acquisition system using second keyword or critical field
Inquiry;
First updates submodule, if for not inquiring second keyword or keyword in first data acquisition system
Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system,
The data to match in the second data set with second keyword or critical field are updated to the number of targets
According in set;
Second updates submodule, for will match in first data acquisition system with first keyword or critical field
Data be updated in the target data set.
36. data processing system as claimed in claim 33, which is characterized in that when the target data set is combined into zipper data
When set, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for being carried out in first data acquisition system using second keyword or critical field
Inquiry;
Third updates submodule, if for not inquiring second keyword or keyword in first data acquisition system
Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system,
The data to match in the second data set with second keyword or critical field are updated to the number of targets
According in set;
Third determines submodule, for determine in the second data set with first keyword or critical field phase
The the first zipper data matched;
Submodule is modified, when for modifying the closed chain of the first sub- zipper data in the first zipper data in open chain state
Between for time for generating the second data set, and based in first data acquisition system and first keyword or pass
The data that key field matches generate the second sub- zipper data of the first zipper data, wherein the second sub- zipper number
According to the open chain time be the time for generating the second data set, the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or critical field
The data to match, it is raw based on the data to match in first data acquisition system with first keyword or critical field
At the second zipper data, wherein the open chain time of the second zipper data is the time for generating the second data set, is closed
The chain time is empty or maximum;
4th updates submodule, for modified first zipper data and the second zipper data to be updated to the target
In data acquisition system.
37. data processing system as claimed in claim 32, which is characterized in that the data processing system further include:
First recovery module occurs updating mistake when if carrying out data update to the target data set for detecting, make
Restore the data in the target data set with the data in the second data set of generation;Or
There are data and updates mistake in second recovery module when if carrying out data update to the target data set for detecting
Accidentally, the Backup Data set backed up in advance is obtained, restores the target data set using the data in the Backup Data set
Data in conjunction.
38. data processing system as claimed in claim 32, which is characterized in that
The generation module is specifically also used to obtain in the preset time period before receiving first data acquisition system,
All data stored in the target data set of update, or obtain receive first data acquisition system after, this is waited for
Data in the target data set of update back up all data stored in the target data set being updated or sheet
Data in secondary target data set to be updated are to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last the second data generated when receiving the first data acquisition system
Set, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate second
In data acquisition system, to generate this second data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711418696.XA CN108038225B (en) | 2017-12-25 | 2017-12-25 | A kind of data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711418696.XA CN108038225B (en) | 2017-12-25 | 2017-12-25 | A kind of data processing method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108038225A CN108038225A (en) | 2018-05-15 |
CN108038225B true CN108038225B (en) | 2019-02-12 |
Family
ID=62100949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711418696.XA Active CN108038225B (en) | 2017-12-25 | 2017-12-25 | A kind of data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038225B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10754895B2 (en) | 2018-10-17 | 2020-08-25 | International Business Machines Corporation | Efficient metadata destage during safe data commit operation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104394155A (en) * | 2014-11-27 | 2015-03-04 | 暨南大学 | Multi-user cloud encryption keyboard searching method capable of verifying integrity and completeness |
CN105574404A (en) * | 2015-12-14 | 2016-05-11 | 国家电网公司 | Method and device for prompting to change password |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7707219B1 (en) * | 2005-05-31 | 2010-04-27 | Unisys Corporation | System and method for transforming a database state |
US20140025702A1 (en) * | 2012-07-23 | 2014-01-23 | Michael Curtiss | Filtering Structured Search Queries Based on Privacy Settings |
CN102802056B (en) * | 2012-09-12 | 2015-06-10 | 播思通讯技术(北京)有限公司 | Method used for inserting advertisement in digital broadcasting television program |
CN103455338A (en) * | 2013-09-22 | 2013-12-18 | 广州中国科学院软件应用技术研究所 | Method and device for acquiring data |
US9697235B2 (en) * | 2014-07-16 | 2017-07-04 | Verizon Patent And Licensing Inc. | On device image keyword identification and content overlay |
CN105677307B (en) * | 2014-11-19 | 2019-03-01 | 上海烟草集团有限责任公司 | A kind of mobile terminal big data processing method and system |
-
2017
- 2017-12-25 CN CN201711418696.XA patent/CN108038225B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104394155A (en) * | 2014-11-27 | 2015-03-04 | 暨南大学 | Multi-user cloud encryption keyboard searching method capable of verifying integrity and completeness |
CN105574404A (en) * | 2015-12-14 | 2016-05-11 | 国家电网公司 | Method and device for prompting to change password |
Also Published As
Publication number | Publication date |
---|---|
CN108038225A (en) | 2018-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103930888B (en) | Selected based on the many grain size subpopulation polymerizations updating, storing and response constrains | |
US20230342846A1 (en) | Micro-loan system | |
CN107766402A (en) | A kind of building dictionary cloud source of houses big data platform | |
CN105556552A (en) | Fraud detection and analysis | |
CN106575246A (en) | Machine learning service | |
CN107077492A (en) | The expansible transaction management based on daily record | |
CN106164865A (en) | Affairs batch processing for the dependency perception that data replicate | |
CN107148617A (en) | Automatically configuring for storage group is coordinated in daily record | |
CN106462449A (en) | Multi-database log with multi-item transaction support | |
CN110023925A (en) | It generates, access and display follow metadata | |
CN111367989B (en) | Real-time data index calculation system and method | |
CN106371953A (en) | Compact binary event log generation | |
US20220351002A1 (en) | Hierarchical deep neural network forecasting of cashflows with linear algebraic constraints | |
US11188981B1 (en) | Identifying matching transfer transactions | |
CN111639121A (en) | Big data platform and method for constructing customer portrait | |
CN108038225B (en) | A kind of data processing method and system | |
CN112598510B (en) | Resource data processing method and device | |
CN113934713A (en) | Order data indexing method, system, computer equipment and storage medium | |
CN114756685A (en) | Complaint risk identification method and device for complaint sheet | |
Xiao | Data Processing Model of Bank Credit Evaluation System. | |
Gogulapati et al. | Banking Data Migration from On-Premise to Cloud | |
Tian | AI-Assisted Dynamic Modeling for Data Management in a Distributed System | |
WO2023073414A1 (en) | Storing and searching for data in data stores | |
AU2022203716A1 (en) | Storing and searching for data In data stores | |
EP4244731A1 (en) | Storing and searching for data in data stores |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |