CN108038225B - A kind of data processing method and system - Google Patents

A kind of data processing method and system Download PDF

Info

Publication number
CN108038225B
CN108038225B CN201711418696.XA CN201711418696A CN108038225B CN 108038225 B CN108038225 B CN 108038225B CN 201711418696 A CN201711418696 A CN 201711418696A CN 108038225 B CN108038225 B CN 108038225B
Authority
CN
China
Prior art keywords
data
data set
keyword
acquisition system
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711418696.XA
Other languages
Chinese (zh)
Other versions
CN108038225A (en
Inventor
王清臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nine Chapter Yunji Technology Co Ltd Beijing
Original Assignee
Nine Chapter Yunji Technology Co Ltd Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nine Chapter Yunji Technology Co Ltd Beijing filed Critical Nine Chapter Yunji Technology Co Ltd Beijing
Priority to CN201711418696.XA priority Critical patent/CN108038225B/en
Publication of CN108038225A publication Critical patent/CN108038225A/en
Application granted granted Critical
Publication of CN108038225B publication Critical patent/CN108038225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/235Update request formulation

Abstract

The present invention provides a kind of data processing method and system, receives the first data acquisition system of external system transmission;The second data set associated with target data set to be updated is generated in a data processing system;Empty the data in the target data set;Data update is carried out to the target data set using the data in first data acquisition system and the second data set.In this way, in the first data acquisition system for receiving external system transmission, when needing to carry out data update, it is ensured that the stability of data processing system saves the plenty of time, and improve the efficiency of data update without being scanned to all data.

Description

A kind of data processing method and system
Technical field
The present invention relates to information technology field more particularly to a kind of data processing methods and data processing system.
Background technique
In recent years, big data processing has become global problem with analysis, as economic society is information-based and automation Level is continuously improved, and in many field face big data problems such as public administration, public service, scientific research, business application, needs There are various specific aims and cost-effective solution.Big data platform provides processing capacity for industry big data, collects data The functions such as access, data processing, data storage, query and search, analysis mining, application interface are integrated.
In data processing field, current environment increasingly payes attention to the accumulation of data, increasing with data volume, right It handles the ability of data and has higher requirement to the basic framework of system, need faster processing speed, bigger data Storage capacity and ease for maintenance.
Under some business scenarios, the data variation historical information of recording key section is needed, to meet the needs of users, It needs periodically to be updated the data in database.In some big data platforms, file system is based on distribution The storage of formula file, i.e., file has been stored in different nodes, and traditional data to such data platform carry out history more New processing mode needs to have data progressive scan to described, i.e., in storage region since the first row of first file Scanning, the data until finding needs are modified, but in face of growing data volume and increasingly complicated business, especially It is the big data era of the huge increasing of data volume, carries out the scanning of all data in this way, low efficiency, time-consuming, and especially data volume is got over Greatly, the query time and feedback time needed is longer, be unable to satisfy current data volume it is increasing in the case of timeliness demand, The reasons such as leading to existing data processing system due to computationally intensive, and take a long time, data processing system stability is poor, Easily there is system Caton or even stuck situation.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method and data processing system, to solve existing data processing system It unites the low efficiency of data processing and due to time-consuming etc., leads to the problem of data processing system stability difference.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of data processing methods, which comprises
Store the internal data of the data processing system, and the data obtained from outside;
Manage service logic;
Data service is provided to the external system of data processing system;
Data are handled.
Further, the method also includes:
The operational order for receiving user's input, is managed and is arranged to the data processing system.
Further, the internal data of the storage data processing system, and the step of the data from outside acquisition Suddenly, comprising:
Storing from the data that outside obtains includes direct extraction-type data and document form data.
Further, the step of control service logic, comprising:
The service logic of the data processing system is stored, the service logic includes at least one following: scheduling rule, Data genetic connection, model metadata and wscript.exe.
Further, described the step of providing data service to the external system of data processing system, comprising:
The queue of external system pushed information and data to data processing system;
Storage file form data;
It is connect with the down-stream system of data processing system or service system, is the downstream system by the interface unit System or service system provide data.
Further, the method also includes:
Receive the parameter of input;
Based on preset rules and the parameter, automation tools script is generated.
Further, described the step of data are handled, comprising:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
The target data set is carried out using the data in first data acquisition system and the second data set Data update.
Further, it is generated and target data set associated second to be updated in a data processing system described Before the step of data acquisition system, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and institute if inquiring first keyword in the target data set It states the first keyword or data that critical field matches, executes described generate in a data processing system and mesh to be updated The step of marking data acquisition system associated the second data set.
Further, it is carried out in the target data set described using first keyword or critical field After the step of inquiry, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field It marks in data acquisition system.
Further, the data using in first data acquisition system and the second data set are to the target Data acquisition system carries out the step of data update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set The data to match in conjunction with second keyword or critical field are updated in the target data set;
The data to match in first data acquisition system with first keyword or critical field are updated to institute It states in target data set.
Further, described to use first data acquisition system when the target data set is combined into zipper data acquisition system The step of data update is carried out to the target data set with the data in the second data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set The data to match in conjunction with second keyword or critical field are updated in the target data set;
Determine the first zipper number to match in the second data set with first keyword or critical field According to;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate institute The time of the second data set is stated, and is based in first data acquisition system and first keyword or critical field phase The data matched generate the second sub- zipper data of the first zipper data, wherein when the open chain of the second sub- zipper data Between for time for generating the second data set, the closed chain time is empty or maximum;
If not inquiring the number to match with first keyword or critical field in the second data set According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky Or maximum;
Modified first zipper data and the second zipper data are updated in the target data set.
Further, after the data emptied in the target data set the step of, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation Restore the data in the target data set according to the data in set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, obtains backup in advance Backup Data set, restore the data in the target data set using the data in the Backup Data set.
Further, described to generate second number associated with target data set to be updated in a data processing system The step of according to set, comprising:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated All data stored, or obtain receive first data acquisition system after, in this target data set to be updated Data, back up all data stored in the target data set that has been updated or this target data set to be updated Data in conjunction are to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target data set Data in conjunction are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this institute State the second data set.
The embodiment of the present invention also provides a kind of data processing method, which comprises
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
The target data set is carried out using the data in first data acquisition system and the second data set Data update.
Further, it is generated and target data set associated second to be updated in a data processing system described Before the step of data acquisition system, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and institute if inquiring first keyword in the target data set It states the first keyword or data that critical field matches, executes described generate in a data processing system and mesh to be updated The step of marking data acquisition system associated the second data set.
Further, it is carried out in the target data set described using first keyword or critical field After the step of inquiry, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field It marks in data acquisition system.
Further, the data using in first data acquisition system and the second data set are to the target Data acquisition system carries out the step of data update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set The data to match in conjunction with second keyword or critical field are updated in the target data set;
The data to match in first data acquisition system with first keyword or critical field are updated to institute It states in target data set.
Further, described to use first data acquisition system when the target data set is combined into zipper data acquisition system The step of data update is carried out to the target data set with the data in the second data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and described The data to match with second keyword or critical field are not inquired in one data acquisition system, by second data set The data to match in conjunction with second keyword or critical field are updated in the target data set;
Determine the first zipper number to match in the second data set with first keyword or critical field According to;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate institute The time of the second data set is stated, and is based in first data acquisition system and first keyword or critical field phase The data matched generate the second sub- zipper data of the first zipper data, wherein when the open chain of the second sub- zipper data Between for time for generating the second data set, the closed chain time is empty or maximum;
If not inquiring the number to match with first keyword or critical field in the second data set According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky Or maximum;
Modified first zipper data and the second zipper data are updated in the target data set.
Further, after the data emptied in the target data set the step of, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation Restore the data in the target data set according to the data in set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, obtains backup in advance Backup Data set, restore the data in the target data set using the data in the Backup Data set.
Further, described to generate second number associated with target data set to be updated in a data processing system The step of according to set, comprising:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated All data stored, or obtain receive first data acquisition system after, in this target data set to be updated Data, back up all data stored in the target data set that has been updated or this target data set to be updated Data in conjunction are to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target data set Data in conjunction are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this institute State the second data set.
The embodiment of the present invention also provides a kind of data processing system, and the data processing system includes:
Data memory module, for storing the internal data of the data processing system, and the data obtained from outside;
Business logic modules, for managing service logic;
Data service module, for providing data service to the external system of data processing system;
Data processing engine module, for handling data.
Further, the data processing system includes:
Information exchange module, for receive user input operational order, the data processing system is managed and Setting.
Further, the data memory module is distributed file storage system, data memory module storage from The data that outside obtains include direct extraction-type data and document form data.
Further, the business logic modules include:
Storage unit, for storing the service logic of the data processing system, the service logic include it is following at least One of: scheduling rule, data genetic connection, model metadata and wscript.exe.
Further, the data service module includes:
Push unit, for the queue of external system pushed information and data to data processing system;
Unit is achieved, storage file form data are used for;
Data transmission interface unit passes through institute for connecting with the down-stream system of data processing system or service system It states interface unit and provides data for the down-stream system or service system.
Further, the data processing system further includes automation tools module, and the automation tools module includes:
Parameter receiving unit, parameter for receiving input;
Script generation unit generates automation tools script for being based on preset rules and the parameter.
Further, the data processing engine module includes:
Receiving unit, for receiving the first data acquisition system of external system transmission;
Generation unit, for generating second number associated with target data set to be updated in a data processing system According to set;
Clearing cell, for emptying the data in the target data set;
First updating unit, for using the data in first data acquisition system and the second data set to described Target data set carries out data update.
Further, the data processing engine module further include:
First determination unit, for determining the first keyword or critical field from first data acquisition system;
Query unit, for being looked into the target data set using first keyword or critical field It askes;
Execution unit, if for inquiring first keyword or critical field in the target data set, The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system The step of at the second data set associated with target data set to be updated.
Further, the data processing engine module further include:
Second updating unit, if for not inquiring first keyword or key in the target data set Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system Data be updated in the target data set.
Further, first updating unit includes:
First determines subelement, for determining the second keyword or critical field from the second data set;
First inquiry subelement, for using second keyword or critical field in first data acquisition system It is inquired;
First updates subelement, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Second update subelement, for by first data acquisition system with first keyword or critical field phase Matched data are updated in the target data set.
Further, when the target data set is combined into zipper data acquisition system, first updating unit includes:
Second determines subelement, for determining the second keyword or critical field from the second data set;
Second inquiry subelement, for using second keyword or critical field in first data acquisition system It is inquired;
Third updates subelement, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Third determines subelement, determine in the second data set with first keyword or critical field phase The the first zipper data matched;
Subelement is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Subelement is generated, if for not inquiring in the second data set and first keyword or key The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when Between, the closed chain time is empty or maximum;
4th updates subelement, described for being updated to modified first zipper data and the second zipper data In target data set.
Further, the data processing engine module includes:
First recovery unit occurs updating when if carrying out data update to the target data set for detecting wrong Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery unit Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set Data in set.
Further, the generation unit is also used to obtain the preset time period received before first data acquisition system All data that are interior, being stored in the target data set being updated, or obtain receive first data acquisition system after, Data in this target data set to be updated back up all data stored in the target data set being updated Or the data in this target data set to be updated are to generate the second data set;
Alternatively, the generation unit is also used to obtain last the second data set generated when receiving the first data acquisition system Close, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when the second number for generating According in set, to generate this second data set.
The embodiment of the invention also provides a kind of data processing system, the data processing system includes:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second number associated with target data set to be updated in a data processing system According to set;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to described Target data set carries out data update.
Further, the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being looked into the target data set using first keyword or critical field It askes;
Execution module, if for inquiring first keyword or critical field in the target data set, The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system The step of at the second data set associated with target data set to be updated.
Further, the data processing system further include:
Second update module, if for not inquiring first keyword or key in the target data set Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system Data be updated in the target data set.
Further, first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for using second keyword or critical field in first data acquisition system It is inquired;
First updates submodule, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Second update submodule, for by first data acquisition system with first keyword or critical field phase Matched data are updated in the target data set.
Further, when the target data set is combined into zipper data acquisition system, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for using second keyword or critical field in first data acquisition system It is inquired;
Third updates submodule, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Third determines submodule, for determine in the second data set with first keyword or critical field The the first zipper data to match;
Submodule is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or key The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when Between, the closed chain time is empty or maximum;
4th updates submodule, described for being updated to modified first zipper data and the second zipper data In target data set.
Further, the data processing system further include:
First recovery module occurs updating when if carrying out data update to the target data set for detecting wrong Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery module Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set Data in set.
Further, which is characterized in that
The generation module is specifically also used to obtain in the preset time period before receiving first data acquisition system, All data stored in the target data set being updated, or obtain receive first data acquisition system after, this Data in secondary target data set to be updated, back up all data stored in the target data set that has been updated or Data in this target data set to be updated of person are to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last second generated when receiving the first data acquisition system Data acquisition system, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate In the second data set, to generate this second data set.
Data processing method and data processing system provided in an embodiment of the present invention receive the first number of external system transmission According to set;The second data set associated with target data set to be updated is generated in a data processing system;Empty institute State the data in target data set;Using the data in first data acquisition system and the second data set to the mesh It marks data acquisition system and carries out data update.In this way, needing to carry out data more in the first data acquisition system for receiving external system transmission When new, it is ensured that the stability of data processing system saves the plenty of time, and improve without being scanned to all data The efficiency that data update.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is the flow chart for the data processing method that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides data processing method flow chart;
Fig. 3 is the service information list that the data before indicating not update in target data set indicate;
Fig. 4 is the information table for indicating the data in the first data acquisition system and indicating;
Fig. 5 is the information table for indicating the data in the second data set and indicating;
Fig. 6 and Fig. 7 is the process schematic for indicating the information indicated the data in target data set and being updated;
Fig. 8 is the service information list for indicating the data in updated target data set and indicating;
Fig. 9 is the service information list that the zipper data before indicating not update in target data set indicate;
Figure 10 is the information table for indicating the zipper data in the second data set and indicating;
Figure 11 is the information table for indicating the data in the first data acquisition system and indicating;
Figure 12 and Figure 13 is the process schematic for indicating the information indicated the data in target data set and being updated;
Figure 14 is the service information list for indicating the data in updated target data set and indicating;
Figure 15 is the structure chart for the data processing system that one embodiment of the invention provides;
Figure 16 is one of the structure chart of the data processing engine module of data processing system shown in Figure 15;
Figure 17 is the two of the structure chart of the data processing engine module of data processing system shown in Figure 15;
Figure 18 is the three of the structure chart of the data processing engine module of data processing system shown in Figure 15;
The four of the structure chart of the data processing engine module of data processing system shown in Figure 19 Figure 15;
Figure 20 is one of the structure chart of the first updating unit shown in Figure 16;
Figure 21 is the two of the structure chart of the first updating unit shown in Figure 16.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Data processing method provided in an embodiment of the present invention is applied to data processing system, and the data processing system can To be a kind of data engineering platform (Data Engineering Platform, DEP), which comprises store the data The internal data of processing system, and the data obtained from outside;Manage service logic;To the external system of data processing system Data service is provided;Data are handled.
Wherein, the internal data of the data processing system, and the data obtained from outside are stored, number is mainly passed through It is handled according to the data memory module of processing system.And data memory module can be distributed document storage (Hadoop Distributed File System, HDFS) system.HDFS system is accumulation layer, for storing the internal data of DEP, and The data that storage DEP is obtained from external system.DEP obtains data from external system, can be and directly extracts data, such as relationship Data in type Database Systems DB2, the data in database Cloud Server Oracle ExaData, the data of Excel format, It can also be document form data, i.e., be sent to the data of DEP, such as the data of textual form with document form, further include non- Structural formula data, such as log log, audio/video multimedia file.
Wherein, service logic is managed, is mainly handled by the business logic modules of data processing system.And industry Business logic module can also store the service logic of the data processing system, and the service logic includes at least one following: Scheduling rule, data genetic connection, model metadata and wscript.exe (such as automation tools) etc..
Wherein, data service is provided to the external system of data processing system, mainly passes through the number of data processing system It is handled according to service module.Wherein, data service module can also to the queue of external system pushed information and data, such as PUSH message queue, propelling data to database;Storage file form data;And with the down-stream system of data processing system or The connection of person's service system, provides data, such as reporting system, Analysis Service etc. for the down-stream system or service system.
Wherein, the method also includes: receive user input operational order, the data processing system is managed And setting.User may include business personnel (personnel on service line), operation maintenance personnel (personnel on technology line) etc., user's interaction Corresponding UI user interface can be set in module.
Wherein, it can also be realized by the automation tools module of data processing system: receive the parameter of input;Based on pre- If rule and the parameter, generate automation tools script.The parameter of input can be the ginseng of user input data processing system Number can also be according to the instruction write-in parameter corresponding with described instruction received.The parameter includes at least one following: Title, field, the data type of data acquisition system.
For example, if wondering certain customer banking account remaining sum situation of change, i.e., it should be understood that the remaining sum (mesh of client The balance information table indicated in mark data acquisition system) and revenue and expenditure detail (the balance detail information table indicated in target data set), The carry out data update for generating and realizing in above-described embodiment can be automated by the corresponding automation tools of automation module Method (connection table inquiry compares and insertion algorithm) correlative code, real dynamic inquiry.It needs to run based on business, it is flat in Hadoop Platform record data variation history operation, specifically can by Hadoop platform data variation historical record into HDFS.
It should be noted that automation tools module rule-based (such as can be carried out by the data processing method The method that data update) write automation tools (i.e. one section of program), it is only necessary to understand which data acquisition system needs in DEP pass through The history of zipper method record variation, is generated by the automation that the algorithm routine can be realized in the automation tools, such as SQL statement is generated in Hive.
Wherein, data are handled, is mainly located by the data processing engine module of data processing system Reason.Data processing engine module can be structured query language (Structured Query Language) engine modules, letter Claim SQL engine modules, SQL engine modules can be made of engines such as Hive and/or Spark.
It is the flow chart for the data processing method that one embodiment of the invention provides referring to Fig. 1, Fig. 1.The method can answer For data processing system, as shown in Figure 1, the described method comprises the following steps:
Step 101, the first data acquisition system for receiving external system transmission.
Under some business scenarios, the data variation historical information of recording key section, to meet customer need, example are needed Such as in financial field, certain customer banking account remaining sum change histories information need to be recorded, to meet customer inquiries bank account balances Demand.It is therefore desirable to carry out periodic data update to the data in database.
Therefore, in this step, the first data set that the reception of data processing system meeting periodicity is transmitted from external system It closes.
Wherein, the reception of data processing system periodicity first data acquisition system, can be with time limit fixed cycle First data acquisition system is received, receive within such as 1 day primary or is received within 12 hours primary;For the timeliness of data, at data Reason system is also possible to reception or approximate real time reception first data acquisition system in real time, receives once within such as 1 hour, or Person's half an hour receives once or even a few minutes reception one is inferior, does not do any restriction.It can be in first data acquisition system and include There are modification data in batch.
Wherein, first data acquisition system, can be the set of single data, such as the set of single type of service data, Such as only it is also possible to the number of single client or target comprising the deposit data financial transaction data that perhaps flowing water pays data According to, such as only include Zhang San, or only include Li Si associated traffic data, be also possible to the set of integrated data, such as comprising not With the data of type of service, such as data financial transaction data, and communication can be paid comprising deposit data and flowing water simultaneously Data etc. can also include the data of multiple clients or target simultaneously, such as simultaneously include the related service number of Zhang San and Li Si According to etc..
Wherein, the first data acquisition system for receiving external system transmission can be and directly receive the first data from external system Set, is also possible to the related memory module by data processing system after storing the first data acquisition system of external system, The first data acquisition system of external system is obtained from memory module.
Step 102 generates second data set associated with target data set to be updated in a data processing system It closes.
In the step, after the data processing system receives first data acquisition system, the data processing system System can control in the data processing system, generate one and the target data set associated second to be updated Data acquisition system.
Wherein, the type of the second data set associated with the target data set, can be second data The data type for including in set perhaps type and data type or data in the target data set represented by data Represented type is identical.
For example, for example, the data in the target data set be certain client Zhang San cash in banks data or industry Be engaged in pipelined data etc., then the data in the second data set generated be also client Zhang San cash in banks data or Business pipelined data, and if the data in the target data set include the cash in banks data or industry of certain client Zhang San The cash in banks data or business pipelined data of business pipelined data and Li Si, then the second data set generated In data be also client Zhang San cash in banks data perhaps the cash in banks data of business pipelined data and Li Si or Business pipelined data.
Wherein, the data for including in the second data set, can be most complete data, i.e., described second data set Conjunction is the maximum data acquisition system of time span, such as the data for including in the second data set, be can be from described in generation Target data set runs the beginning jointly, ends all data recorded in target data set described in current time, that is to say, that described The second data set records the time span of data, is since generating the business datum until the current time, that is, most The long time.
Wherein, data that are corresponding, including in the target data set, can be most complete data, i.e., described Target data set is the maximum data acquisition system of time span;The data for including in the target data set are also possible to only Comprising the partial data in most complete data, the data of this reproducting periods are such as only arrived comprising last update, i.e., only include Data in one update cycle or the data in several update cycles.
Preferably, the data in the target data set and the second data set, all be comprising maximum time across Data information in degree.
Wherein, the second data set is the time-domain snapshot data set of the target data set.
For generating the second data set, can be by way of backup, usage history data i.e. this update The mode that preceding target data set is backed up generates the second data set.
Further, the second data set, which can be, is generated or is updated based on set frequency, such as described The generation of the second data set or renewal frequency can be set to once a day, it is preferred that the generation of the second data set Or renewal frequency, can be that be transmitted to the frequency of data processing system with modification data in batch identical, i.e., with data processing system The frequency of reception first data acquisition system for periodicity of uniting is identical.
In this way, control generation and target data set can be passed through after data processing system receives the first data acquisition system Associated the second data set is closed, determines target data set without being scanned to data all in data processing system The position of middle data and back end, it is time saving and energy saving, the workload of data processing system can be effectively reduced, improve work effect Rate.
Step 103 empties data in the target data set.
In the step, when data processing system control generates second data in the data processing system After set, the data processing system, which can control, empties the data in the target data set, so as to subsequent to institute State progress data update in target data set.
Step 104, using the data in first data acquisition system and the second data set to the target data Set carries out data update.
In the step, after the data processing system empties the data in the target data set, the data Processing system can extract the dependency number in the data and the second data set for needing to update in first data acquisition system According to be inserted into, add or be written in the target data set, so that the target data set is carried out data update.
It preferably, is by the way of inquiry insertion by first data acquisition system and described second in present embodiment Data in data acquisition system are updated in the target data set, are carried out more to the data in the target data set Newly.
For example, for example, the data in the target data set be certain client Zhang San cash in banks data or industry Business pipelined data etc., then using the data in first data acquisition system and the second data set to the target data Set carries out data update, so that it may be the new cash in banks data or business using Zhang San in first data acquisition system The passing cash in banks data of Zhang San or business pipelined data in pipelined data and the second data set, to store To in the target data set, data update, or the such as described target data are carried out to the target data set Data in set are the cash in banks data or business pipelined data and the cash in banks of client Li Si of certain client Zhang San Data or business pipelined data, and such as this is to need to be updated the data of Zhang San, i.e., described first data acquisition system In have the new cash in banks data or business pipelined data of Zhang San, then can be using in first data acquisition system The passing bank of Zhang San deposits in the new cash in banks data or business pipelined data and the second data set of Zhang San Amount of money is according to the perhaps passing cash in banks data or business pipelined data of business pipelined data and client Li Si, to deposit Storage carries out data update into the target data set, to the target data set.
Wherein, data update is carried out to the target data set, is also possible to periodically update, update one within such as one day It is secondary or update within 12 hours one inferior, it is preferred that the update cycle of the target data set, can be at the data The frequency that reason system receives first data acquisition system is identical.
In this way, after data processing system receives the first data acquisition system, it can be by generating and number of targets to be updated According to gathering associated the second data set, and after emptying target data set, by the first data acquisition system and the second data set In data inquiry insertion by way of be inserted into target data set, to be updated to target data set, without pair Data processing system carries out the scanning of total data, and the update of data in target data set can be completed, and can save totally The time of scanning, and then the workload of data processing system is effectively reduced, improve working efficiency.
In the embodiment of the present invention, above-mentioned data processing system can be the backstage for developing and running processing data and put down Platform etc. is realized and carries out distributed computing to mass data in the cluster that a large amount of computers form, it is preferred that the data processing System is big data platform.
Above-mentioned data processing system can be applied to the big data application of financial system, medical system and educational system etc. Scene, such as bank data system, hospital data system and school's data system.
Data processing method provided in an embodiment of the present invention receives the first data acquisition system of external system transmission;In data The second data set associated with target data set to be updated is generated in processing system;Empty the target data set In data;The target data set is carried out using the data in first data acquisition system and the second data set Data update.In this way, when needing to carry out data update, can pass through in the first data acquisition system for receiving external system transmission Data relevant to the data in the target data set are extracted in a data processing system, to generate and mesh to be updated The associated the second data set of data acquisition system is marked, then by the first data acquisition system and the second data by way of inquiry insertion In data insertion target data set in set, to be updated to the data in target data set, without to all numbers It is scanned according to node, the update of data in target data set can be completed, the plenty of time of scan full hard disk can be saved, And then the workload of data processing system is effectively reduced, improve the efficiency that data update.
Referring to fig. 2, Fig. 2 be another embodiment of the present invention provides data processing method flow chart.The method application In data processing system, as shown in Fig. 2, the described method comprises the following steps:
Step 201, the first data acquisition system for receiving external system transmission.
Step 202 determines the first keyword or critical field from first data acquisition system.
In the step, after the data processing system receives the first data acquisition system of external system transmission, the number It can be according to the data for needing to store or update in first data acquisition system, from first data set according to processing system Corresponding first keyword or critical field are determined in conjunction.
Wherein, first keyword or critical field, only refer to, for example include more in first data acquisition system The business datum of a type perhaps the data of multiple clients when can be the business datum to each type respectively or each visitor The data at family are updated, when the business datum to corresponding type or the data of client update every time, corresponding type Business datum or the data of client all have corresponding first keyword or critical field.
Wherein, the first keyword or critical field can be set according to actual needs, such as use a keyword It can indicate data to be updated, can only determine keyword, conversely, needing the critical field of multiple keyword compositions It could indicate to more capable data, i.e., it needs to be determined that critical field.
Step 203 is inquired in the target data set using first keyword or critical field.
In the step, after the data processing system determines first keyword or critical field, the data Processing system can control is inquired using first keyword or critical field, i.e., using first keyword or Person's critical field is inquired in the target data set, so that the data processing system can be learnt by inquiry, Whether have existed in the target data set and matches with the data of first keyword or critical field expression Data historical information or data record etc..
If step 204 inquires first keyword or critical field, Huo Zhecha in the target data set Ask the data to match with first keyword or critical field, execute it is described generate in a data processing system with to The step of target data set of update associated the second data set.
In the step, when the data processing system uses first keyword or critical field in the number of targets According to being inquired in set, and first keyword or critical field are inquired in the target data set, or If person inquires the data for existing in the target data set and matching with first keyword or critical field, that The data processing system can think exist and first keyword or keyword in the target data set The passing information for the data that section matches, then the data processing system can control described in execution in data processing system The step of middle generation the second data set associated with target data set to be updated, to be completed by subsequent action Data in the target data set are updated.
Wherein, the data to match with first keyword or critical field are inquired, can be referred in the mesh When being inquired in mark data acquisition system, since certain data are the problems such as putting in order, it may be displayed in compared with rearward position, this Sample may expend the time if directly inquiring first keyword or critical field longer, at this moment, when inquiring and institute If stating the first keyword or data that critical field matches, so that it may be considered to have inquired first keyword or Person's critical field reduces data scanning amount in this way, the time can be saved.
Wherein, the data to match with first keyword or critical field can be and first keyword Perhaps such as described first keyword of the associated data of data or critical field that critical field indicates be, Zhang Sanhuo The ID of person Zhang San, then the data that first keyword or critical field match can be expression, Zhang San or open The data of three ID are also possible to certain deposits or Flow Record of the ID on some date of expression, Zhang San or Zhang San Data, can either indicate, data of information such as the telephone number of the ID of Zhang San or Zhang San or identification card number etc..
Step 205 generates second data set associated with target data set to be updated in a data processing system It closes.
Step 206 empties data in the target data set.
Step 207, using the data in first data acquisition system and the second data set to the target data Set carries out data update.
Wherein, the description of step 201 and step 205 to step 207 is referred to the step 101 in above-described embodiment to step Rapid 104 description, this will not be repeated here.
Optionally, after step 203, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire The data of first data acquisition system are updated to the mesh by the data to match with first keyword or critical field It marks in data acquisition system.
In the step, when the data processing system uses first keyword or critical field in the number of targets According to being inquired in set, and do not inquire first keyword or critical field in the target data set, and And if not inquiring the data to match with first keyword or critical field in the target data set, that The data processing system can think, the first keyword or critical field described in the first object data acquisition system The data of expression, are completely new data for the target data set, and the data processing system can be direct By data insertion, addition or the write-in in first data acquisition system into the target data set, thus to described Data in target data set are updated.
Optionally, step 207 includes:
Determined from the second data set the second keyword perhaps critical field using second keyword or Critical field is inquired in first data acquisition system, is closed if not inquiring described second in first data acquisition system Key word critical field and does not inquire and second keyword or critical field perhaps in first data acquisition system The data to match update the data to match in the second data set with second keyword or critical field To in the target data set;By what is matched in first data acquisition system with first keyword or critical field Data are updated in the target data set.
In the step, after the data processing system empties the data in the target data set, the data Processing system can be according to the data for including in the second data set, to determine the second keyword or critical field, so It is inquired in first data acquisition system using second keyword or critical field afterwards, to inquire first number According to whether there is the data to match with second keyword or critical field in set, if the data processing system is logical Inquiry is crossed, determines and does not inquire second keyword or critical field in first data acquisition system, also, is determined It is not inquired in first data acquisition system with second keyword or the matched data of critical field, the data Processing system can consider in the second data set and be not required to second keyword or the matched data of critical field Update, so, the data processing system can by the second data set with second keyword or critical field The data to match are updated in the target data set after emptying, then further according to first keyword or key Field extracts the determining data to match with first keyword or critical field from first data acquisition system Out, and by the data extracted it is updated in the target data set after emptying, to complete to the target data set The data of conjunction update.
Wherein, the data to match in the second data set with second keyword or critical field are updated To in the target data set, and by first data acquisition system with first keyword or critical field phase The data matched are updated in the target data set, can be by data being passed through insertion, adds or writes after inquiry The modes such as enter, is updated in the target data set.
For example, please referring to Fig. 3 indicates that the data before not updating in target data set are indicated into Fig. 5, such as Fig. 3 Service information list, be represented in Fig. 4 the information table that data in the first data acquisition system indicate, indicate the second data set in Fig. 5 In the information table that indicates of data, Fig. 6 and Fig. 7 indicate the mistake that the information indicated the data in target data set is updated Journey schematic diagram indicates the service information list that the data in updated target data set indicate in Fig. 8.Such as the institute before not updating Stating the data in target data set indicates the credit balance information of Zhang San and Li Si, the tables of data in first data acquisition system Show the related deposit business information that personnel Zhang San of business handling, king five, Zhao six etc. are carried out in the past period, described the Data in two data acquisition systems indicate all business information of same personnel in the target data set, i.e. Zhang San and Li Si Credit balance information.
It is so when carrying out data update to the target data set, the data in the target data set are clear Sky after emptying the information in the tables of data in Fig. 3, obtains blank letter shown in Fig. 6 of the target data expression of blank Cease table;Then the data processing system can determine the second keyword or critical field from the second data set (such as ID of the ID of Zhang San perhaps Li Si) is then according to second keyword or critical field in first data set It is inquired in conjunction, inquires whether to have in first data acquisition system and match with second keyword or critical field Data the data for whether having the related service information for indicating Zhang San or Li Si are inquired, such as in first data acquisition system Fruit does not inquire the data to match with second keyword or critical field in first data acquisition system, as in institute The data for not inquiring in the first data acquisition system and matching with the keyword of Li Si or critical field are stated, mean that this data It updates, the business datum of Li Si does not need to update, then can will be crucial with described second in the second data set The data that word or critical field match, i.e., business datum relevant to Li Si, are updated to the target data after emptying In set, to complete first step update, the related service information table of Li Si shown in Fig. 7 is obtained, whereas if described the The data to match with second keyword or critical field are inquired in one data acquisition system, such as in first data set The data to match with the keyword or critical field for indicating Zhang San are inquired in conjunction, mean that this data updates, have and open Three business datum needs are updated, and are just not required to for the business datum of Zhang San in the second data set to be added to the institute after emptying It states in target data set, i.e., will not match with second keyword or critical field in the second data set Data are added to the target data set after emptying;Then, the data processing system can be according to first data The first keyword or critical field in set, the first keyword such as relevant to the business datum of Zhang San, king five and Zhao six Or critical field (such as ID of Zhang San, king five and Zhao six), by first data acquisition system with first keyword or The data that person's critical field (such as ID of Zhang San, king five and Zhao six) matches are added directly to the target data after emptying In set, to complete to update the data of the target data set, thus obtain Fig. 8 shows the updated target The service information list that data in data acquisition system indicate.
Optionally, when the target data set is combined into zipper data acquisition system, step 207 includes:
It is inquired in first data acquisition system using second keyword or critical field;If described Second keyword or critical field are not inquired in one data acquisition system, and is not inquired in first data acquisition system It, will be crucial with described second in the second data set to the data to match with second keyword or critical field The data that word or critical field match are updated to the target data set;Determine in the second data set with it is described The first zipper data that first keyword or critical field match are modified and are in open chain state in the first zipper data Closed chain time of the first sub- zipper data be the time for generating the second data set, and be based on first data acquisition system In the data that match with first keyword or critical field, generate the second sub- zipper number of the first zipper data According to, wherein the open chain time of the second sub- zipper data is the time for generating the second data set, and the closed chain time is sky Or maximum;If not inquiring the number to match with first keyword or critical field in the second data set According to based on data the second zipper of generation to match in first data acquisition system with first keyword or critical field Data, wherein the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is sky Or maximum;Modified first zipper data, the second zipper data are updated in the target data set.
In the step, if the target data set is combined into zipper data acquisition system, i.e., the number in the described target data set According to for zipper data, after the data processing system empties the data in the target data set, the data Processing system can be according to the data for including in the second data set, to determine the second keyword or critical field, so It is inquired in first data acquisition system using second keyword or critical field afterwards, to inquire first number According to whether there is the data to match with second keyword or critical field in set, if the data processing system is logical Inquiry is crossed, determines and does not inquire second keyword or critical field in first data acquisition system, also, is determined It is not inquired in first data acquisition system with second keyword or the matched data of critical field, the data Processing in the second data set with second keyword or the matched data of critical field it is considered that do not need more Newly, so, the data processing system can by the second data set with second keyword or critical field phase The data matched are updated in the target data set after emptying.
Then, first keyword or critical field can be used in second data in the data processing system It is inquired in set, if there is the data to match with first keyword or critical field in the second data set Words, the data processing system can be in the second data sets, determining and first keyword or critical field The the first zipper data to match, then the data processing system can modify to the first zipper data, thus It is set as the closed chain time for being in the first sub- zipper data of open chain state in the first zipper data to generate described second The time of data acquisition system, further, when the data processing system can reset the open chain of the first zipper data Between, that is, the sub- zipper data of second in open chain state of the first zipper data are generated, specifically, the data processing system It unites the data to be matched in available first data acquisition system with first keyword or critical field, and according to institute The data to match in the first data acquisition system with first keyword or critical field are stated, to generate the first zipper number According to the second sub- zipper data, the open chain time of the second sub- zipper data is the time for generating the second data set, The closed chain time is empty or maximum, and expression is in open chain state up to now.
If the data processing system do not inquired in the second data set with first keyword or If the data that critical field matches, then just illustrate in first data acquisition system with first keyword or key The data that field matches all are new data, the data processing system can according in first data acquisition system with institute The first keyword is stated or data that critical field matches, to generate new zipper data, i.e. the second zipper data, wherein The open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or maximum.
Finally, the data processing system is by modified first zipper data and newly-generated the second zipper number According to being updated in the target data set, complete to update the data of the target data set.
For example, the zipper number before not updating in target data set is indicated please refer to Fig. 9 to Figure 11, in Fig. 9 Indicate the information table that the zipper data in the second data set indicate according to the service information list of expression, in Figure 10, Figure 11 indicates the The information table that data in one data acquisition system indicate, Figure 12 and Figure 13 indicate the information indicated the data in target data set The process schematic being updated indicates the service information list that the data in updated target data set indicate in Figure 14. It is described if the zipper data in the target data set before not updating indicate the balance of deposits managing detailed catalogue of Zhang San and Li Si Data in first data acquisition system indicate to carry out personnel Zhang San, king five, Zhao six of business handling etc. in the past period Related deposit business information, the data in the second data set indicate and identical deposit personnel in the target data set All business information, i.e., the data in the described the second data set indicate the managing detailed catalogue of the balance of deposits of Zhang San and Li Si.
It is so when carrying out data update to the target data set, the data in the target data set are clear Sky after emptying the information in the tables of data in Fig. 9, obtains blank letter shown in Figure 12 of the target data expression of blank Cease table;Then the data processing system can determine the second keyword or critical field from the second data set (such as ID of the ID of Zhang San perhaps Li Si) is then according to second keyword or critical field in first data set It is inquired in conjunction, inquires whether to have in first data acquisition system and match with second keyword or critical field Data the data for whether having the related service information for indicating Zhang San or Li Si are inquired, such as in first data acquisition system Fruit does not inquire the data to match with second keyword or critical field in first data acquisition system, as in institute The data for not inquiring in the first data acquisition system and matching with the keyword of Li Si or critical field are stated, mean that this data It updates, the business datum of Li Si does not need to update, then can will be crucial with described second in the second data set The data that word or critical field match, i.e., business datum relevant to Li Si, are updated to the target data after emptying In set, to complete first step update, the detail list of the related service information of Li Si shown in Figure 13 is obtained;Then, it uses First keyword or critical field are inquired in the second data set, if in the second data set In inquire the data to match with first keyword or critical field, such as inquired in the second data set The data to match with the keyword or critical field for indicating Zhang San mean that this data updates, there is the business number of Zhang San According to needing to be updated, then, the data processing system can be crucial according to described first in the second data set Word or critical field determine the data for indicating the related service information of Zhang San, i.e., the first of the deposit information detail of expression Zhang San Then the closed chain time modification of the first sub- zipper data that open chain state is in the first zipper data is by zipper data The time of the second data set is generated, i.e., the time that data update (carries out data update to the target data set Time), and according to the data to match in first data acquisition system with first keyword or critical field, that is, it indicates The data of the new business information of Zhang San, come generate a new expression Zhang San deposit information detail zipper data, i.e., second Sub- zipper data, the open chain time that the second sub- zipper data are arranged is the time for generating the second data set, i.e. data Renewal time, closed chain time are empty or maximum;Then by the second data set with first keyword or pass The data that key field matches, i.e., business datum relevant to Zhang San are updated in the target data set after emptying, from And complete second step update;, whereas if not inquired in the second data set and first keyword or pass The data that key field matches do not inquire the data for indicating the relevant information of king five and Zhao six such as, then the data processing system System can according in first data acquisition system with first keyword or critical field, i.e., in described first data acquisition system Indicate king five and Zhao six relevant data, come generate in first data acquisition system with first keyword or keyword The second zipper data of data that section matches, come indicate king five and Zhao six related service information detail list, and can be with The open chain time that the second zipper data are arranged is data renewal time, and the closed chain time is empty or maximum;It then will be described First zipper data, the i.e. data of the related service of expression Zhang San, and the second zipper data generated, i.e. expression king five and Zhao Six relevant business datum is updated in the target data set after emptying, to complete to the target data set Data update, thus obtain Figure 14 expression the updated target data set in data indicate business information Table.
Optionally, after step 201, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second number of generation Restore the data in the target data set according to the data in set.
In the step, data are completed in the target data set and are updated or in target data set progress data When update, the data processing system can update the data of the target data set to be monitored in real time, if prison It measures and occurs updating mistake when carrying out data update to the target data set, i.e., go out in step 206 and/or step 207 When now updating the situation of mistake, the data processing system can carry out data recovery to the target data set, specifically, The available the second data set generated in step 205 of the data processing system, then using the second data generated Data in set restore the data in the target data set.
After restoring the data in the target data set, the data processing system, which can control, stops data more Newly.
Here it is possible to directly use the second data set, i.e., the time-domain snapshot data set of target data set carries out Data are restored, simple and fast, the opposite data refresh mode shorter suitable for the data update cycle.
Alternatively, updating mistake if detecting and occurring data when carrying out data update to the target data set, obtain pre- The Backup Data set first backed up restores the number in the target data set using the data in the Backup Data set According to.
In the step, data are completed in the target data set and are updated or in target data set progress data When update, the data processing system can update the data of the target data set to be monitored in real time, if prison It measures and occurs updating mistake when carrying out data update to the target data set, the data processing system can be to the mesh It marks data acquisition system and carries out data recovery, specifically, the available preparatory backup data set backed up of the data processing system It closes, data recovery then is carried out to the target data set using the data in the Backup Data set.
Wherein, the backup cycle of the Backup Data set can be the setting for carrying out backup cycle as needed, such as standby 1 month data volume of part.
Wherein, the Backup Data set can be the data saved in first data acquisition system, and to described second Data acquisition system, i.e. data in time-domain snapshot data set back up a full dose data according to default backup cycle.
After restoring the data in the target data set, the data processing system, which can control, stops data more Newly.
Here, using back mechanism, that is, data how long is backed up and just restore data how long, such as have been backed up one month Data just restore one month data, simple and fast, opposite to be suitable for data update cycle longer data refresh mode.
In present embodiment, monitoring that occurring data when carrying out data update to the target data set updates mistake When, above two mode rollback can be used to carry out data recovery, however, it is not limited to this, in other embodiments, Data can be ignored and update false alarm, continue data update, can also be after rollback recovery data, re-start Data update.
Optionally, step 205 includes:
It obtains in the preset time period before receiving first data acquisition system, in the target data set being updated All data stored, or obtain receive first data acquisition system after, in this target data set to be updated Data, back up all data stored in the target data set that has been updated or this target data set to be updated Data in conjunction are to generate the second data set.
For generating the second data set, can be by way of backup, the mode of usage history data backup Generate the second data set.
Therefore, in this step, after the data processing system receives first data acquisition system, at the data Reason system can detect historical data, using after receiving first data acquisition system, this target to be updated The mode that data acquisition system is backed up generates the second data set;First data are received at this alternatively, obtaining In preset time period before set, all data stored in updated target data set, thus will more All data backups stored in the target data set newly crossed are into a set, to generate second data set It closes.
Alternatively, the second data set that the acquisition last time generates when receiving the first data acquisition system, by presently described target Data in data acquisition system are inserted into the last the second data set generated when receiving the first data acquisition system, to generate this The secondary the second data set.
For generating the second data set, can be by way of being updated to available data insertion, in conjunction with existing Historical data generate the second data set.
Therefore, in the step, after this described data processing system receives first data acquisition system, the data Processing system is available before this receives first data acquisition system, when the last time receives the first data acquisition system Then the second data set of generation, then obtains the data in the target data set, and will be in the target data set Data be inserted into the last the second data set generated when receiving the first data acquisition system, thus the institute to generate this State the second data set.
Data processing method provided in an embodiment of the present invention receives the first data acquisition system of external system transmission;From described The first keyword or critical field are determined in first data acquisition system;Using first keyword or critical field described It is inquired in target data set;If inquiring first keyword or keyword in the target data set Section either inquires the data to match with first keyword or critical field, executes described in data processing system The step of middle generation the second data set associated with target data set to be updated;In a data processing system generate with The associated the second data set of target data set to be updated;Empty the data in the target data set;Using institute The data stated in the first data acquisition system and the second data set carry out data update to the target data set.In this way, Using the data in the first data acquisition system and the second data set to the number in target data set by way of inquiry insertion According to being updated, without being scanned to all data and node, the update of data in target data set can be completed, it can be with The plenty of time of scan full hard disk is saved, and then the workload of data processing system is effectively reduced, improves the efficiency that data update.
Wherein, the embodiment of the method for Fig. 1 to Fig. 2 can be used for data processing system, and data processing system can be realized Each process in the embodiment of the method for Fig. 1 to Fig. 2.
It is the structure chart for the data processing system that one embodiment of the invention provides, Tu16Wei referring to Figure 15 to Figure 21, Figure 15 One of the structure chart of the data processing engine module of data processing system shown in Figure 15, Figure 17 are data processing shown in Figure 15 Two, Figure 18 of the structure chart of the data processing engine module of system is the data processing engine of data processing system shown in Figure 15 The four of the structure chart of the data processing engine module of data processing system shown in three, Figure 19 Figure 15 of the structure chart of module, figure 20 be one of the structure chart of the first updating unit shown in Figure 16, and Figure 21 is the structure of the first updating unit shown in Figure 16 The two of figure.As shown in figure 15, data processing system 1500 includes data memory module 1510, business logic modules 1520, data Service module 1530 and data processing engine modules 1540.
The data processing system 1500 can be a kind of data engineering platform (Data Engineering Platform, DEP)。
Wherein, the data memory module 1510 is used to store the internal data of the data processing system 1500, and The data obtained from outside.
The data memory module 1510 can be distributed document storage (Hadoop Distributed File System, HDFS) system.HDFS system is accumulation layer, and for storing the internal data of DEP, and storage DEP is from external system The data of acquisition.DEP obtains data from external system, can be and directly extracts in data, such as system R DB2 Data, the data in database Cloud Server Oracle ExaData, the data of Excel format, can also be document form Data are sent to the data of DEP, such as the data of textual form with document form, further include unstructured data, such as Log log, audio/video multimedia file.
Wherein, the business logic modules 1520 are for managing service logic.The business logic modules 1520 can wrap The storage unit for storing the service logic of the data processing system is included, the service logic includes at least one following: scheduling Rule, data genetic connection, model metadata and wscript.exe (such as automation tools) etc..
Wherein, the data service module 1530 is used to provide data service to the external system of data processing system, Include:
Push unit 1531 is used for the queue of external system pushed information and data, such as PUSH message queue, push number According to database.
Unit 1532 is achieved, storage file form data are used for.
Data transmission interface (Representational State Transfer API, Rest API) unit 1533 is used In with the down-stream system of data processing system perhaps service system connect by the interface unit be the down-stream system or Service system provides data, such as reporting system, Analysis Service etc..
The data processing engine module 1540 can be structured query language for handling data (Structured Query Language) engine modules, abbreviation SQL engine modules, SQL engine modules can by Hive and/ Or the engines such as Spark are constituted.
Optionally, the data processing system 1500 further include:
Information exchange module 1550 carries out pipe to the data processing system for receiving the operational order of user's input Reason and setting.User may include business personnel (personnel on service line), operation maintenance personnel (personnel on technology line) etc., Yong Hujiao Corresponding UI user interface can be set in mutual module.
Optionally, the data processing system 1500 further includes automation tools module, can be rule-based (such as logical Cross the method that the data processing method carries out data update) write automation tools (i.e. one section of program), it is only necessary to understand in DEP In which data acquisition system need by zipper method record variation history, the algorithm routine can be realized by the automation tools Automation generate, such as SQL statement is generated in Hive.
Wherein, the automation tools module may include:
Parameter receiving unit, parameter for receiving input.
Script generation unit generates automation tools script for being based on preset rules and the parameter.
Specifically, the parameter receiving unit, the parameter of the input data processing system for receiving user, can be root According to the instruction write-in received parameter corresponding with described instruction.The parameter includes at least one following: the name of data acquisition system Title, field, data type.
For example, if wondering certain customer banking account remaining sum situation of change, i.e., it should be understood that the remaining sum (mesh of client The balance information table indicated in mark data acquisition system) and revenue and expenditure detail (the balance detail information table indicated in target data set), The carry out data update for generating and realizing in above-described embodiment can be automated by the corresponding automation tools of automation module Method (connection table inquiry compares and insertion algorithm) correlative code, real dynamic inquiry.It needs to run based on business, it is flat in Hadoop Platform record data variation history operation, specifically can by Hadoop platform data variation historical record into HDFS.
Wherein, as shown in figure 16, the data processing engine module 1540 includes:
Receiving unit 1541, for receiving the first data acquisition system of external system transmission.
Generation unit 1542, for generating associated with target data set to be updated in a data processing system Two data acquisition systems.
Clearing cell 1543, for emptying the data in the target data set.
First updating unit 1544, for using the data pair in first data acquisition system and the second data set The target data set carries out data update.
Wherein, the first data acquisition system of the receiving unit 1541 received external system transmission, can be directly from outer Portion's system receives first data acquisition system, is also possible to be stored in the number by the first data acquisition system that external system is transmitted After memory module 1510, first data acquisition system is obtained from the data memory module 1510.
Optionally, as shown in figure 17, the data processing engine module 1540 further include:
First determination unit 1545, for determining the first keyword or critical field from first data acquisition system.
Query unit 1546, for using first keyword or critical field in the target data set into Row inquiry.
Execution unit 1547, if for inquiring first keyword or keyword in the target data set Section either inquires the data to match with first keyword or critical field, executes described in data processing system The step of middle generation the second data set associated with target data set to be updated.
Optionally, as shown in figure 17, the data processing engine module 1540 further include:
Second updating unit 1548, if for do not inquired in the target data set first keyword or Critical field, and the data to match with first keyword or critical field are not inquired, by first data The data of set are updated in the target data set.
Optionally, as shown in figure 18, the data processing engine module 1540 further include:
First recovery unit 1549 updates when if carrying out data update to the target data set for detecting Mistake restores the data in the target data set using the data in the second data set of generation.
Alternatively, as shown in figure 19, the data processing engine module 1540 includes:
There is number when if carrying out data update to the target data set for detecting in second recovery unit 15410 According to mistake is updated, the Backup Data set backed up in advance is obtained, restores the mesh using the data in the Backup Data set Mark the data in data acquisition system.
Optionally, as shown in figure 20, first updating unit 1544 includes:
First determines subelement 15441, for determining the second keyword or keyword from the second data set Section.
First inquiry subelement 15442, for using second keyword or critical field in first data It is inquired in set.
First updates subelement 15443, if for not inquiring second keyword in first data acquisition system Perhaps it critical field and is not inquired in first data acquisition system and second keyword or critical field phase The data to match in the second data set with second keyword or critical field are updated to institute by the data matched It states in target data set.
Second update subelement 15444, for by first data acquisition system with first keyword or key The data that field matches are updated in the target data set.
Optionally, as shown in figure 21, when the target data set is combined into zipper data acquisition system, first updating unit 1544 include:
Second determines subelement 15445, for determining the second keyword or keyword from the second data set Section.
Second inquiry subelement 15446, for using second keyword or critical field in first data It is inquired in set.
Third updates subelement 15447, if for not inquiring second keyword in first data acquisition system Perhaps it critical field and is not inquired in first data acquisition system and second keyword or critical field phase The data to match in the second data set with second keyword or critical field are updated to institute by the data matched It states in target data set.
Third determines subelement 15448, for determine in the second data set with first keyword or pass The zipper data that key field matches.
Subelement 15449 is modified, for modifying the first sub- zipper number for being in open chain state in the first zipper data According to the closed chain time be the time for generating the second data set, and based on being closed with described first in first data acquisition system The data that key word or critical field match generate the second sub- zipper data of the first zipper data, wherein described the The open chain time of two sub- zipper data is the time for generating the second data set, and the closed chain time is empty or maximum.
Generate subelement 154410, if for do not inquired in the second data set with first keyword or The data that person's critical field matches are based in first data acquisition system and first keyword or critical field phase The data matched generate the second zipper data, wherein the open chain time of the second zipper data is to generate second data set The time of conjunction, closed chain time are empty or maximum.
4th updates subelement 154411, for by modified zipper data and first data acquisition system with first The data that keyword or critical field match are updated in the target data set.
Optionally, the generation unit 1542 is also used to obtain the preset time received before first data acquisition system In section, all data stored in the target data set that has been updated, or acquisition receive first data acquisition system Afterwards, the data in this target data set to be updated back up owning of storing in the target data set that has been updated Data in data or this target data set to be updated are to generate the second data set.
Alternatively, the generation unit 1542 is also used to obtain last the second number generated when receiving the first data acquisition system According to set, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate the In two data acquisition systems, to generate this second data set.
Data processing system 1500 provided in an embodiment of the present invention can be realized data in the embodiment of the method for Fig. 1 to Fig. 2 Each process that processing system is realized, to avoid repeating, which is not described herein again.
Data processing system provided in an embodiment of the present invention is needed in the first data acquisition system for receiving external system transmission When carrying out data update, connection table inquiry mode can be used by the means of inquiry insertion and carry out data update, to guarantee number The plenty of time is saved, and improve the efficiency of data update without being scanned to all data according to the stability of processing system.
The embodiment of the present invention also provides a kind of data processing system, and the data processing system includes: receiving module, generates Module removes module and the first update module, in which:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second number associated with target data set to be updated in a data processing system According to set;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to described Target data set carries out data update.
Optionally, the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being looked into the target data set using first keyword or critical field It askes;
Execution module, if for inquiring first keyword or critical field in the target data set, The data to match with first keyword or critical field are either inquired, execution is described to give birth in a data processing system The step of at the second data set associated with target data set to be updated.
Optionally, the data processing system further include:
Second update module, if for not inquiring first keyword or key in the target data set Field, and the data to match with first keyword or critical field are not inquired, by first data acquisition system Data be updated in the target data set.
Optionally, first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for using second keyword or critical field in first data acquisition system It is inquired;
First updates submodule, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Second update submodule, for by first data acquisition system with first keyword or critical field phase Matched data are updated in the target data set.
Optionally, when the target data set is combined into zipper data acquisition system, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for using second keyword or critical field in first data acquisition system It is inquired;
Third updates submodule, if for not inquiring second keyword or pass in first data acquisition system Key field, and do not inquire the number to match with second keyword or critical field in first data acquisition system According to the data to match in the second data set with second keyword or critical field are updated to the target In data acquisition system;
Third determines submodule, for determine in the second data set with first keyword or critical field The the first zipper data to match;
Submodule is modified, for modifying closing for the first sub- zipper data in the first zipper data in open chain state The chain time is the time for generating the second data set, and based in first data acquisition system with first keyword or The data that person's critical field matches generate the second sub- zipper data of the first zipper data, wherein second son is drawn The open chain time of chain data is the time for generating the second data set, and the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or key The data that field matches, based on the number to match in first data acquisition system with first keyword or critical field According to generating the second zipper data, wherein the open chain time of the second zipper data be the generation the second data set when Between, the closed chain time is empty or maximum;
4th updates submodule, described for being updated to modified first zipper data and the second zipper data In target data set.
Optionally, the data processing system further include:
First recovery module occurs updating when if carrying out data update to the target data set for detecting wrong Accidentally, restore the data in the target data set using the data in the second data set of generation;Or
There is data update when if carrying out data update to the target data set for detecting in second recovery module Mistake obtains the Backup Data set backed up in advance, restores the target data using the data in the Backup Data set Data in set.
Optionally, the generation module, be specifically also used to obtain receive first data acquisition system before it is default when Between in section, all data stored in the target data set that has been updated, or obtain and receive first data set After conjunction, data in this target data set to be updated back up the institute stored in the target data set being updated There are the data in data or this target data set to be updated to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last second generated when receiving the first data acquisition system Data acquisition system, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate In the second data set, to generate this second data set.
Data processing system provided in an embodiment of the present invention can be realized data processing in the embodiment of the method for Fig. 1 to Fig. 2 Each process that system is realized, to avoid repeating, which is not described herein again.
Data processing system provided in an embodiment of the present invention is needed in the first data acquisition system for receiving external system transmission When carrying out data update, connection table inquiry mode can be used by the means of inquiry insertion and carry out data update, to guarantee number The plenty of time is saved, and improve the efficiency of data update without being scanned to all data according to the stability of processing system.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims (38)

1. a kind of data processing method, which is characterized in that be applied to data processing system, which comprises
Store the internal data of the data processing system, and the data obtained from outside;
Manage service logic;
Data service is provided to the external system of data processing system;
Data are handled;
Wherein, described the step of data are handled, comprising:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
Data are carried out to the target data set using the data in first data acquisition system and the second data set It updates;
Wherein, the data using in first data acquisition system and the second data set are to the target data set Carry out data update, comprising:
The data in first data acquisition system and the second data set are inserted into using the mode of inquiry insertion described In target data set, to be updated to the data in the target data set.
2. the method as described in claim 1, which is characterized in that the method also includes:
The operational order for receiving user's input, is managed and is arranged to the data processing system.
3. the method as described in claim 1, which is characterized in that the internal data of the storage data processing system, with And from outside obtain data the step of, comprising:
Storing from the data that outside obtains includes direct extraction-type data and document form data.
4. the method as described in claim 1, which is characterized in that the step of the control service logic, comprising:
The service logic of the data processing system is stored, the service logic includes at least one following: scheduling rule, data Genetic connection, model metadata and wscript.exe.
5. the method as described in claim 1, which is characterized in that described to provide data clothes to the external system of data processing system The step of business, comprising:
The queue of external system pushed information and data to data processing system;
Storage file form data;
Perhaps service system connect and provides number for the down-stream system or service system with the down-stream system of data processing system According to.
6. the method as described in claim 1, which is characterized in that the method also includes:
Receive the parameter of input;
Based on preset rules and the parameter, automation tools script is generated.
7. such as method of any of claims 1-6, which is characterized in that it is described generate in a data processing system with Before the step of target data set to be updated associated the second data set, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and described if inquiring first keyword in the target data set The data that one keyword or critical field match execute described generate in a data processing system and number of targets to be updated According to the step of gathering associated the second data set.
8. the method for claim 7, which is characterized in that existed described using first keyword or critical field After the step of being inquired in the target data set, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire and institute It states the first keyword or data that critical field matches, the data of first data acquisition system is updated to the number of targets According in set.
9. the method for claim 7, which is characterized in that described to use first data acquisition system and second data The step of data in set carry out data update to the target data set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set It is updated in the target data set with the data that second keyword or critical field match;
The data to match in first data acquisition system with first keyword or critical field are updated to the mesh It marks in data acquisition system.
10. the method for claim 7, which is characterized in that when the target data set is combined into zipper data acquisition system, institute It states and data is carried out more to the target data set using the data in first data acquisition system and the second data set New step, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set It is updated in the target data set with the data that second keyword or critical field match;
Determine the first zipper data to match in the second data set with first keyword or critical field;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate described the The time of two data acquisition systems, and based on matching with first keyword or critical field in first data acquisition system Data generate the second sub- zipper data of the first zipper data, wherein the open chain time of the second sub- zipper data is The time of the second data set is generated, the closed chain time is empty or maximum;
If not inquiring the data to match with first keyword or critical field, base in the second data set The data to match in first data acquisition system with first keyword or critical field generate the second zipper data, Wherein, the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or pole Big value;
Modified first zipper data and the second zipper data are updated in the target data set.
11. such as method of any of claims 1-6, which is characterized in that empty the target data set described In data the step of after, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second data set of generation Data in conjunction restore the data in the target data set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, acquisition is backed up standby in advance Part data acquisition system, restores the data in the target data set using the data in the Backup Data set.
12. such as method of any of claims 1-6, which is characterized in that it is described generate in a data processing system with The step of target data set to be updated associated the second data set, comprising:
It obtains in the preset time period before receiving first data acquisition system, is stored in the target data set being updated All data crossed, or obtain receive first data acquisition system after, the number in this target data set to be updated According in all data or this target data set to be updated stored in the target data set that backup has been updated Data to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, will be in presently described target data set Data be inserted into the last the second data set generated when receiving the first data acquisition system, to generate this described the Two data acquisition systems.
13. a kind of data processing method, which is characterized in that the described method includes:
Receive the first data acquisition system of external system transmission;
The second data set associated with target data set to be updated is generated in a data processing system;
Empty the data in the target data set;
Data are carried out to the target data set using the data in first data acquisition system and the second data set It updates;
Wherein, the data using in first data acquisition system and the second data set are to the target data set Carry out data update, comprising:
The data in first data acquisition system and the second data set are inserted into using the mode of inquiry insertion described In target data set, to be updated to the data in the target data set.
14. method as claimed in claim 13, which is characterized in that it is described in a data processing system generate with it is to be updated Before the step of target data set associated the second data set, which comprises
The first keyword or critical field are determined from first data acquisition system;
It is inquired in the target data set using first keyword or critical field;
It perhaps critical field or is inquired and described if inquiring first keyword in the target data set The data that one keyword or critical field match execute described generate in a data processing system and number of targets to be updated According to the step of gathering associated the second data set.
15. method as claimed in claim 14, which is characterized in that use first keyword or critical field described After the step of being inquired in the target data set, which comprises
If not inquiring first keyword or critical field in the target data set, and do not inquire and institute It states the first keyword or data that critical field matches, the data of first data acquisition system is updated to the number of targets According in set.
16. method as claimed in claim 14, which is characterized in that described to be counted using first data acquisition system with described second The step of data update is carried out to the target data set according to the data in set, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set It is updated in the target data set with the data that second keyword or critical field match;
The data to match in first data acquisition system with first keyword or critical field are updated to the mesh It marks in data acquisition system.
17. method as claimed in claim 14, which is characterized in that when the target data set is combined into zipper data acquisition system, The data using in first data acquisition system and the second data set carry out data to the target data set The step of update, comprising:
The second keyword or critical field are determined from the second data set;
It is inquired in first data acquisition system using second keyword or critical field;
If not inquiring second keyword or critical field in first data acquisition system, and in first number It, will be in the second data set according to the data to match with second keyword or critical field are not inquired in set It is updated in the target data set with the data that second keyword or critical field match;
Determine the first zipper data to match in the second data set with first keyword or critical field;
The closed chain time of the first sub- zipper data in the first zipper data in open chain state is modified to generate described the The time of two data acquisition systems, and based on matching with first keyword or critical field in first data acquisition system Data generate the second sub- zipper data of the first zipper data, wherein the open chain time of the second sub- zipper data is The time of the second data set is generated, the closed chain time is empty or maximum;
If not inquiring the data to match with first keyword or critical field, base in the second data set The data to match in first data acquisition system with first keyword or critical field generate the second zipper data, Wherein, the open chain time of the second zipper data is the time for generating the second data set, and the closed chain time is empty or pole Big value;
Modified first zipper data and the second zipper data are updated in the target data set.
18. method as claimed in claim 13, which is characterized in that empty data in the target data set described After step, which comprises
Occur updating mistake when carrying out data update to the target data set if detecting, uses the second data set of generation Data in conjunction restore the data in the target data set;Or
Mistake is updated if detecting and occurring data when carrying out data update to the target data set, acquisition is backed up standby in advance Part data acquisition system, restores the data in the target data set using the data in the Backup Data set.
19. method as claimed in claim 13, which is characterized in that described to generate in a data processing system and mesh to be updated The step of marking data acquisition system associated the second data set, comprising:
It obtains in the preset time period before receiving first data acquisition system, is stored in the target data set being updated All data crossed, or obtain receive first data acquisition system after, the number in this target data set to be updated According in all data or this target data set to be updated stored in the target data set that backup has been updated Data to generate the second data set;Or
The second data set that the acquisition last time generates when receiving the first data acquisition system, will be in presently described target data set Data be inserted into the last the second data set generated when receiving the first data acquisition system, to generate this described the Two data acquisition systems.
20. a kind of data processing system, which is characterized in that the data processing system includes:
Data memory module, for storing the internal data of the data processing system, and the data obtained from outside;
Business logic modules, for managing service logic;
Data service module, for providing data service to the external system of data processing system;
Data processing engine module, for handling data;
Wherein, the data processing engine module includes:
Receiving unit, for receiving the first data acquisition system of external system transmission;
Generation unit, for generating second data set associated with target data set to be updated in a data processing system It closes;
Clearing cell, for emptying the data in the target data set;
First updating unit, for using the data in first data acquisition system and the second data set to the target Data acquisition system carries out data update;
Wherein, first updating unit is also used for the mode of inquiry insertion for first data acquisition system and described the Data in two data acquisition systems are inserted into the target data set, to carry out more to the data in the target data set Newly.
21. data processing system as claimed in claim 20, which is characterized in that the data processing system includes:
Information exchange module is managed and is arranged to the data processing system for receiving the operational order of user's input.
22. data processing system as claimed in claim 20, which is characterized in that the data memory module is distributed document The data of storage system, data memory module storage from the outside acquisition include direct extraction-type data and document form number According to.
23. data processing system as claimed in claim 20, which is characterized in that the business logic modules include:
Storage unit, for storing the service logic of the data processing system, the service logic includes at least one following: Scheduling rule, data genetic connection, model metadata and wscript.exe.
24. data processing system as claimed in claim 20, which is characterized in that the data service module includes:
Push unit, for the queue of external system pushed information and data to data processing system;
Unit is achieved, storage file form data are used for;
Data transmission interface unit is connect for connecting with the down-stream system of data processing system or service system by described Mouth unit provides data for the down-stream system or service system.
25. data processing system as claimed in claim 20, which is characterized in that the data processing system further includes automation Tool model, the automation tools module include:
Parameter receiving unit, parameter for receiving input;
Script generation unit generates automation tools script for being based on preset rules and the parameter.
26. the data processing system as described in any one of claim 20-25, which is characterized in that the data processing engine Module further include:
First determination unit, for determining the first keyword or critical field from first data acquisition system;
Query unit, for being inquired in the target data set using first keyword or critical field;
Execution unit, if for inquired in the target data set first keyword perhaps critical field or Inquire the data to match with first keyword or critical field, execute it is described generate in a data processing system with The step of target data set to be updated associated the second data set.
27. data processing system as claimed in claim 26, which is characterized in that the data processing engine module further include:
Second updating unit, if for not inquiring first keyword or keyword in the target data set Section, and the data to match with first keyword or critical field are not inquired, by first data acquisition system Data are updated in the target data set.
28. data processing system as claimed in claim 26, which is characterized in that first updating unit includes:
First determines subelement, for determining the second keyword or critical field from the second data set;
First inquiry subelement, for being carried out in first data acquisition system using second keyword or critical field Inquiry;
First updates subelement, if for not inquiring second keyword or keyword in first data acquisition system Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system, The data to match in the second data set with second keyword or critical field are updated to the number of targets According in set;
Second updates subelement, for will match in first data acquisition system with first keyword or critical field Data be updated in the target data set.
29. data processing system as claimed in claim 26, which is characterized in that when the target data set is combined into zipper data When set, first updating unit includes:
Second determines subelement, for determining the second keyword or critical field from the second data set;
Second inquiry subelement, for being carried out in first data acquisition system using second keyword or critical field Inquiry;
Third updates subelement, if for not inquiring second keyword or keyword in first data acquisition system Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system, The data to match in the second data set with second keyword or critical field are updated to the number of targets According in set;
Third determines subelement, determines and matches with first keyword or critical field in the second data set First zipper data;
Subelement is modified, when for modifying the closed chain of the first sub- zipper data in the first zipper data in open chain state Between for time for generating the second data set, and based in first data acquisition system and first keyword or pass The data that key field matches generate the second sub- zipper data of the first zipper data, wherein the second sub- zipper number According to the open chain time be the time for generating the second data set, the closed chain time is empty or maximum;
Subelement is generated, if for not inquiring in the second data set and first keyword or critical field The data to match, it is raw based on the data to match in first data acquisition system with first keyword or critical field At the second zipper data, wherein the open chain time of the second zipper data is the time for generating the second data set, is closed The chain time is empty or maximum;
4th updates subelement, for modified first zipper data and the second zipper data to be updated to the target In data acquisition system.
30. the data processing system as described in any one of claim 20-25, which is characterized in that the data processing engine Module includes:
First recovery unit occurs updating mistake when if carrying out data update to the target data set for detecting, make Restore the data in the target data set with the data in the second data set of generation;Or
There are data and updates mistake in second recovery unit when if carrying out data update to the target data set for detecting Accidentally, the Backup Data set backed up in advance is obtained, restores the target data set using the data in the Backup Data set Data in conjunction.
31. the data processing system as described in any one of claim 20-25, which is characterized in that
The generation unit is also used to obtain in the preset time period before receiving first data acquisition system, has been updated All data stored in target data set, or obtain receive first data acquisition system after, this is to be updated Data in target data set, back up all data stored in the target data set being updated or this is waited for more Data in new target data set are to generate the second data set;
Alternatively, the generation unit is also used to obtain the last the second data set generated when receiving the first data acquisition system, By the data in presently described target data set be inserted into it is last receive the first data acquisition system when the second data for generating In set, to generate this second data set.
32. a kind of data processing system, which is characterized in that the data processing system includes:
Receiving module, for receiving the first data acquisition system of external system transmission;
Generation module, for generating second data set associated with target data set to be updated in a data processing system It closes;
Module is removed, for emptying the data in the target data set;
First update module, for using the data in first data acquisition system and the second data set to the target Data acquisition system carries out data update;
Wherein, first update module is also used for the mode of inquiry insertion for first data acquisition system and described the Data in two data acquisition systems are inserted into the target data set, to carry out more to the data in the target data set Newly.
33. data processing system as claimed in claim 32, which is characterized in that the data processing system further include:
First determining module, for determining the first keyword or critical field from first data acquisition system;
Enquiry module, for being inquired in the target data set using first keyword or critical field;
Execution module, if for inquired in the target data set first keyword perhaps critical field or Inquire the data to match with first keyword or critical field, execute it is described generate in a data processing system with The step of target data set to be updated associated the second data set.
34. data processing system as claimed in claim 33, which is characterized in that the data processing system further include:
Second update module, if for not inquiring first keyword or keyword in the target data set Section, and the data to match with first keyword or critical field are not inquired, by first data acquisition system Data are updated in the target data set.
35. data processing system as claimed in claim 33, which is characterized in that first update module includes:
First determines submodule, for determining the second keyword or critical field from the second data set;
First inquiry submodule, for being carried out in first data acquisition system using second keyword or critical field Inquiry;
First updates submodule, if for not inquiring second keyword or keyword in first data acquisition system Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system, The data to match in the second data set with second keyword or critical field are updated to the number of targets According in set;
Second updates submodule, for will match in first data acquisition system with first keyword or critical field Data be updated in the target data set.
36. data processing system as claimed in claim 33, which is characterized in that when the target data set is combined into zipper data When set, first update module includes:
Second determines submodule, for determining the second keyword or critical field from the second data set;
Second inquiry submodule, for being carried out in first data acquisition system using second keyword or critical field Inquiry;
Third updates submodule, if for not inquiring second keyword or keyword in first data acquisition system Section, and do not inquire the data to match with second keyword or critical field in first data acquisition system, The data to match in the second data set with second keyword or critical field are updated to the number of targets According in set;
Third determines submodule, for determine in the second data set with first keyword or critical field phase The the first zipper data matched;
Submodule is modified, when for modifying the closed chain of the first sub- zipper data in the first zipper data in open chain state Between for time for generating the second data set, and based in first data acquisition system and first keyword or pass The data that key field matches generate the second sub- zipper data of the first zipper data, wherein the second sub- zipper number According to the open chain time be the time for generating the second data set, the closed chain time is empty or maximum;
Submodule is generated, if for not inquiring in the second data set and first keyword or critical field The data to match, it is raw based on the data to match in first data acquisition system with first keyword or critical field At the second zipper data, wherein the open chain time of the second zipper data is the time for generating the second data set, is closed The chain time is empty or maximum;
4th updates submodule, for modified first zipper data and the second zipper data to be updated to the target In data acquisition system.
37. data processing system as claimed in claim 32, which is characterized in that the data processing system further include:
First recovery module occurs updating mistake when if carrying out data update to the target data set for detecting, make Restore the data in the target data set with the data in the second data set of generation;Or
There are data and updates mistake in second recovery module when if carrying out data update to the target data set for detecting Accidentally, the Backup Data set backed up in advance is obtained, restores the target data set using the data in the Backup Data set Data in conjunction.
38. data processing system as claimed in claim 32, which is characterized in that
The generation module is specifically also used to obtain in the preset time period before receiving first data acquisition system, All data stored in the target data set of update, or obtain receive first data acquisition system after, this is waited for Data in the target data set of update back up all data stored in the target data set being updated or sheet Data in secondary target data set to be updated are to generate the second data set;
Alternatively, the generation module, specifically it is also used to obtain last the second data generated when receiving the first data acquisition system Set, by the data in presently described target data set be inserted into it is last receive the first data acquisition system when generate second In data acquisition system, to generate this second data set.
CN201711418696.XA 2017-12-25 2017-12-25 A kind of data processing method and system Active CN108038225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711418696.XA CN108038225B (en) 2017-12-25 2017-12-25 A kind of data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711418696.XA CN108038225B (en) 2017-12-25 2017-12-25 A kind of data processing method and system

Publications (2)

Publication Number Publication Date
CN108038225A CN108038225A (en) 2018-05-15
CN108038225B true CN108038225B (en) 2019-02-12

Family

ID=62100949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711418696.XA Active CN108038225B (en) 2017-12-25 2017-12-25 A kind of data processing method and system

Country Status (1)

Country Link
CN (1) CN108038225B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10754895B2 (en) 2018-10-17 2020-08-25 International Business Machines Corporation Efficient metadata destage during safe data commit operation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394155A (en) * 2014-11-27 2015-03-04 暨南大学 Multi-user cloud encryption keyboard searching method capable of verifying integrity and completeness
CN105574404A (en) * 2015-12-14 2016-05-11 国家电网公司 Method and device for prompting to change password

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707219B1 (en) * 2005-05-31 2010-04-27 Unisys Corporation System and method for transforming a database state
US20140025702A1 (en) * 2012-07-23 2014-01-23 Michael Curtiss Filtering Structured Search Queries Based on Privacy Settings
CN102802056B (en) * 2012-09-12 2015-06-10 播思通讯技术(北京)有限公司 Method used for inserting advertisement in digital broadcasting television program
CN103455338A (en) * 2013-09-22 2013-12-18 广州中国科学院软件应用技术研究所 Method and device for acquiring data
US9697235B2 (en) * 2014-07-16 2017-07-04 Verizon Patent And Licensing Inc. On device image keyword identification and content overlay
CN105677307B (en) * 2014-11-19 2019-03-01 上海烟草集团有限责任公司 A kind of mobile terminal big data processing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394155A (en) * 2014-11-27 2015-03-04 暨南大学 Multi-user cloud encryption keyboard searching method capable of verifying integrity and completeness
CN105574404A (en) * 2015-12-14 2016-05-11 国家电网公司 Method and device for prompting to change password

Also Published As

Publication number Publication date
CN108038225A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN103930888B (en) Selected based on the many grain size subpopulation polymerizations updating, storing and response constrains
US20230342846A1 (en) Micro-loan system
CN107766402A (en) A kind of building dictionary cloud source of houses big data platform
CN105556552A (en) Fraud detection and analysis
CN106575246A (en) Machine learning service
CN107077492A (en) The expansible transaction management based on daily record
CN106164865A (en) Affairs batch processing for the dependency perception that data replicate
CN107148617A (en) Automatically configuring for storage group is coordinated in daily record
CN106462449A (en) Multi-database log with multi-item transaction support
CN110023925A (en) It generates, access and display follow metadata
CN111367989B (en) Real-time data index calculation system and method
CN106371953A (en) Compact binary event log generation
US20220351002A1 (en) Hierarchical deep neural network forecasting of cashflows with linear algebraic constraints
US11188981B1 (en) Identifying matching transfer transactions
CN111639121A (en) Big data platform and method for constructing customer portrait
CN108038225B (en) A kind of data processing method and system
CN112598510B (en) Resource data processing method and device
CN113934713A (en) Order data indexing method, system, computer equipment and storage medium
CN114756685A (en) Complaint risk identification method and device for complaint sheet
Xiao Data Processing Model of Bank Credit Evaluation System.
Gogulapati et al. Banking Data Migration from On-Premise to Cloud
Tian AI-Assisted Dynamic Modeling for Data Management in a Distributed System
WO2023073414A1 (en) Storing and searching for data in data stores
AU2022203716A1 (en) Storing and searching for data In data stores
EP4244731A1 (en) Storing and searching for data in data stores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant