CN106570024B - Data increment processing method and device - Google Patents

Data increment processing method and device Download PDF

Info

Publication number
CN106570024B
CN106570024B CN201510654209.4A CN201510654209A CN106570024B CN 106570024 B CN106570024 B CN 106570024B CN 201510654209 A CN201510654209 A CN 201510654209A CN 106570024 B CN106570024 B CN 106570024B
Authority
CN
China
Prior art keywords
processing
data
database
identifier
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510654209.4A
Other languages
Chinese (zh)
Other versions
CN106570024A (en
Inventor
洪超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510654209.4A priority Critical patent/CN106570024B/en
Publication of CN106570024A publication Critical patent/CN106570024A/en
Application granted granted Critical
Publication of CN106570024B publication Critical patent/CN106570024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for incremental processing of data, relates to the technical field of internet, and can solve the problem of low processing efficiency when a processing data warehouse is inconsistent with SSAS data. The method of the invention comprises the following steps: before the data is automatically processed, acquiring a processing identifier of the latest data automatic processing recorded in a first database, wherein the processing identifier comprises a time identifier and a position identifier; obtaining a time field from metadata of a first database; judging whether the content in the time identifier is the same as the content in the time field; if the judgment results are the same, performing incremental processing according to the position identification; and if the judgment results are different, performing full processing on the partition corresponding to the automatic processing of the data to be processed. The method is suitable for a scene that the SSAS extracts the incremental data from the data warehouse.

Description

Data increment processing method and device
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for data increment processing.
Background
The data warehouse is used for storing data, and the data analysis system is used for analyzing the data in the database or the data warehouse. Among them, the commonly used data Analysis system is SSAS (database Analysis Service).
In practical applications, the data in the data warehouse is updated in real time (i.e. new data is added), and the SSAS periodically performs incremental processing, i.e. periodically extracts the latest data from the data warehouse and stores the latest data in the SSAS. However, after the SSAS performs the current incremental processing and before the next incremental processing, a manager sometimes needs to analyze the latest data, so the SSAS is manually fully processed, that is, all data stored in the current SSAS is deleted, and then all data in the data warehouse is extracted and stored in the SSAS, so that the latest data in the data warehouse is obtained under the condition that the data in the SSAS is not repeated. However, the manual operation of performing full processing brings a problem to SSAS incremental processing, that is, after SSAS performs the next incremental processing, partial duplicate data is generated. For example, if the cycle of the SSAS incremental processing is 1 day, the time of the current incremental processing is 1 month, 1 day and 8:00, the time of the next incremental processing is 1 month, 2 days and 8:00, whereas the manager performs the full processing operation on the SSAS on 1 month, 1 day and 19:00, so that the data in the time period from 1 month, 1 day and 8:00 to 19:00 are also stored in the SSAS. By 1 month, 2 days and 8:00 year, SSAS performs incremental processing, namely, data in the time period of 1 month, 1 day and 8:00 year to 1 month, 2 days and 8:00 year is stored in SSAS. At this time, the data in the time period of 8:00 to 19:00 for 1 month and 1 day is stored twice in the SSAS, thereby causing a phenomenon that the data in the data warehouse is inconsistent with the data in the SSAS.
In the prior art, when artificial processing is found, the next incremental processing is replaced with full processing operation on the whole SSAS, so as to ensure the consistency of data in a data warehouse and data in the SSAS. However, since the amount of data in the data warehouse is large, it is inefficient to perform a full processing operation on the entire SSAS at one time.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for incremental data processing, which can solve the problem of low processing efficiency when a processing data warehouse is inconsistent with SSAS data.
According to an aspect of the present invention, there is provided a method for incremental processing of data, the method comprising:
before the data is automatically processed, acquiring a processing identifier of the latest data automatic processing recorded in a first database, wherein the data automatic processing is to periodically update data in a second database to the first database, the data automatic processing is incremental processing or full processing, the processing identifier comprises a time identifier and a position identifier, the time identifier is used for representing the time when the data automatic processing is completed, and the position identifier is used for representing the position of the last row of data in the second database after the data automatic processing is completed;
acquiring a time field from metadata of the first database, wherein the time field is used for recording the time when the data processing on the first database is completed last time;
judging whether the content in the time identifier is the same as the content in the time field;
if the judgment results are the same, performing incremental processing according to the position identifier, wherein the incremental processing is to import incremental data in the second database into the first database;
if the judgment results are different, performing full processing on the partition for automatically processing the data, wherein the full processing is deleting all data in the area to be processed in the first database, and importing all data corresponding to the area in the second database into the area.
According to another aspect of the present invention, there is provided an apparatus for incremental processing of data, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a processing identifier of the latest automatic data processing recorded in a first database before the automatic data processing is carried out, the automatic data processing is to periodically update data in a second database into the first database, the automatic data processing is incremental processing or full processing, the processing identifier comprises a time identifier and a position identifier, the time identifier is used for representing the time when the automatic data processing is finished, and the position identifier is used for representing the position of the last row of data in the second database after the automatic data processing is finished;
the obtaining unit is further configured to obtain a time field from the metadata of the first database, where the time field is used to record a time when data processing on the first database is completed last time;
the judging unit is used for judging whether the content in the time identifier is the same as the content in the time field or not;
and the processing unit is used for performing incremental processing according to the position identification when the judgment results of the judgment units are the same, wherein the incremental processing is to import incremental data in the second database into the first database, and when the judgment results of the judgment units are different, the partitions for automatic processing of the data are subjected to full processing, the full processing is to delete all data in the area to be processed in the first database, and import all data corresponding to the area in the second database into the area.
By means of the technical scheme, the method and the device for data increment processing can firstly acquire the processing identifier of the latest data automatic processing recorded in the first database and the time field in the metadata of the current first database before the data automatic processing is carried out, and judge whether the content of the time identifier in the processing identifier is the same as the content of the time field; if the processing marks are the same, incremental processing is carried out according to the position marks in the processing marks, and if the processing marks are not the same, full processing is carried out on the partitions for automatically processing the data. Compared with the prior art that the next incremental processing is directly changed into the full processing of the whole SSAS, the method and the device firstly judge whether the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database before the automatic data processing is carried out, carry out the incremental processing on the current partition in the first database when the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database, and only carry out the full processing on the current partition without carrying out the full processing on the historical partition when the time of the last automatic data processing is different from the time of the latest automatic data processing recorded in the metadata of the first database, thereby improving the efficiency of the automatic data processing under the condition of ensuring the data.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating a method for incremental processing of data according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an apparatus for incremental processing of data according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating another apparatus for incremental data processing according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
An embodiment of the present invention provides a method for incremental processing of data, as shown in fig. 1, where the method includes:
101. before the automatic processing of the data, a processing identifier of the latest automatic processing of the data recorded in the first database is obtained.
The automatic processing of the data is that the data in the second database is periodically updated into the first database, and the automatic processing of the data is incremental processing or full processing. The incremental processing is to import the incremental data in the second database into the first database, the full processing is to delete all the data in the region to be processed in the first database, and import all the data in the corresponding region in the second database into the region. It can be seen that the incremental processing is different from the full processing in that the full processing requires deletion of the history data in the region to be processed, and the incremental processing does not require the deletion.
In practical applications, in order to facilitate operations such as scanning and analyzing the database, the database is often divided into different partitions for management, so when performing automatic data processing on the first database, the processing is sequentially performed on the different partitions (generally, one processing corresponds to only one partition, and one partition can perform multiple processing). That is, each time data automatic processing in the embodiment of the present invention is performed only for one partition (partition that needs to be processed) of the first database, the processing is not performed for the entire first database. In practical applications, the most common partitioning method is to divide the partition according to time sequence,
in addition, the processing identifier comprises a time identifier and a position identifier, the time identifier is used for representing the time when the data automatic processing is completed, and the position identifier is used for representing the position of the last row of data in the second database after the data automatic processing is completed. Before the present data automatic processing is performed, a processing identifier of the latest (i.e. last) data automatic processing needs to be locally obtained from the first database, and then whether the processing mode of the present data automatic processing is incremental processing or full processing is determined according to the processing identifier.
In practical applications, since the data Analysis system often needs to periodically extract the latest data from the data warehouse and analyze the latest data, the first data warehouse may be a database Analysis system (such as SSAS (database Analysis Service)), and the second data warehouse may be a data warehouse. The SSAS periodically extracts incremental data from the fact table of the data warehouse, and when the location of the last row of data in the data warehouse is obtained, the maximum value of the primary key value in the fact table may be directly obtained, and the maximum value of the primary key value is used as the location identifier. The primary key value is used for uniquely identifying one row in the fact table, and the largest primary key value is used for identifying the last row in the fact table.
102. The time field is obtained from metadata of the first database.
The metadata is data describing the data, i.e., describing each attribute of the data. The metadata includes a time field for recording a time when data processing on the first database is completed last time. The latest data processing may be manually performed on the entire first database, or may be performed automatically on the first database by the terminal. That is, in both the manual processing and the automatic processing, the terminal changes the content in the time field to the time when the present processing is completed as long as the first database is processed.
103. And judging whether the content in the time identifier is the same as the content in the time field.
After the time identifier of the latest automatic data processing and the time field in the metadata of the current first database are obtained, the content in the time identifier is compared with the content in the time field, and whether the time identifier and the time field are the same or not is judged. If the judgment results are the same, determining that the last data processing (namely the latest data processing) is the automatic data processing, namely no artificial processing exists between the last data automatic processing and the current data automatic processing; and if the judgment results are different, determining that the previous data processing is artificial processing, namely the artificial processing exists between the previous data automatic processing and the current data automatic processing.
For example, given that the period of the automatic data processing is 1 day, if the content in the time identifier of the latest automatic data processing is 2015 year 3 month 1 day 10:10:20, and the content in the time field of the metadata of the current first database is 2015 year 3 month 1 day 23:25:50, it may be determined that there is an operation of manually performing full processing on the entire first database between the last automatic data processing (latest automatic data processing) and the current automatic data processing (i.e., 2015 year 3 month 2 day 10:10: 20). If the content in the time field in the metadata of the current first database is 2015, 3, 1, 10:20, it may be determined that the latest data processing is data automatic processing, that is, there is no manual operation for performing full processing on the entire first database between the last data automatic processing and the current data automatic processing.
104. And if the judgment results are the same, performing incremental processing according to the position identification.
When the content in the time identifier is judged to be the same as the content in the time field, the latest data processing can be determined to be data automatic processing, namely, full processing operation is not manually performed on the whole first database between the last data automatic processing and the current data automatic processing, so that the target data corresponding to the position identifier can be found in the second database, and then the data after the target data is imported into the first database, thereby realizing incremental processing.
Illustratively, when the position identifier is represented by a row number, if the obtained position identifier of the latest data automatic processing is 1001, the 1001 st row of data is found in the second database, and then the 1002 th row and the following data are imported into the last row of data of the first database, thereby realizing the incremental processing.
105. If the judgment results are different, the partitions for automatically processing the data are subjected to full processing.
When the content in the time identifier is different from the content in the time field, it can be determined that a manual full processing operation on the whole first database exists between the last time of data automatic processing and the current time of data automatic processing, so that the incremental processing cannot be performed according to the position identifier, and the partition needing to be subjected to the data automatic processing at this time needs to be subjected to full processing.
For example, if month 8 is currently 2015, then month 8 generated data is now required to be periodically imported from the second database into the 8 th partition of the first database. If the content in the time identifier is different from the content in the time field, only the 8 th partition needs to be fully processed, and the first 7 partitions do not need to be fully processed, so that the automatic data processing efficiency is improved under the condition of ensuring the data consistency of the first database and the second database.
The method for incremental processing of data provided by the embodiment of the invention can acquire the processing identifier of the latest automatic processing of data recorded in the first database and the time field in the metadata of the current first database before the automatic processing of the data, and judge whether the content of the time identifier in the processing identifier is the same as the content of the time field; if the processing marks are the same, incremental processing is carried out according to the position marks in the processing marks, and if the processing marks are not the same, full processing is carried out on the partitions for automatically processing the data. Compared with the prior art that the next incremental processing is directly changed into the full processing of the whole SSAS, the method and the device firstly judge whether the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database before the automatic data processing is carried out, carry out the incremental processing on the current partition in the first database when the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database, and only carry out the full processing on the current partition without carrying out the full processing on the historical partition when the time of the last automatic data processing is different from the time of the latest automatic data processing recorded in the metadata of the first database, thereby improving the efficiency of the automatic data processing under the condition of ensuring the data.
Further, in practical applications, the partition to be processed by the present data automatic processing may be a new partition, and the partition to be processed by the last data automatic processing is a previous partition. For the above case, the following modifications can be adopted: before the time field is acquired from the metadata of the first database, whether the processing identifier is a processing identifier corresponding to a partition to be automatically processed by the data is detected. If the detection result is yes, acquiring a time field from the metadata of the first database; if the detection result is negative, the time field is not required to be acquired from the metadata of the first database, and the partition for automatically processing the data is directly subjected to full processing.
When the partition to be subjected to data automatic processing is a new partition, namely the first secondary partition is subjected to data automatic processing, the consistency of the data in the first database and the data in the second database can be ensured only by fully processing the partition regardless of the existence of the data in the partition. Therefore, when it is determined that the processing identifier is not the processing identifier corresponding to the partition, the partition is directly subjected to full processing, and in this way, the efficiency of the automatic data processing can be further improved.
Further, the process identification may be stored in the annotation for each partition, or may be stored elsewhere. Therefore, when the processing identifier is stored in the annotation of the partition, a specific implementation manner of detecting whether the processing identifier is the processing identifier corresponding to the partition to be automatically processed by the data may be as follows: it is detected whether the processing identifier is located in an annotation of the partition to be automatically processed with the data.
It should be noted that, when the processing identifier is stored in another storage area, in order to clarify the correspondence between the processing identifier and the partition, a partition identifier may be added to each processing identifier, so that after the processing identifier of the latest data automatic processing is acquired, whether the processing identifier is the processing identifier corresponding to the partition to be automatically processed by the data may be determined by the partition identifier.
Further, after the automatic processing of the data is completed, the processing identifier of the automatic processing of the data needs to be saved to be used as a basis for the next automatic processing of the data. After the data automatic processing of this time is finished, the processing identifier of the last data automatic processing is useless, so that the content in the processing identifier can be directly changed from the last content to the content of this time.
Specifically, after the data is automatically processed, the time when the data is automatically processed needs to be acquired, and the content in the time identifier is changed into the time when the data is automatically processed; and the position of the last line of data in the second database when the automatic processing of the data is completed is also required to be obtained, and the content in the position identifier is changed into the position of the last line of data in the second database when the automatic processing of the data is completed.
When the processing flag is stored in the comment of the partition, if the partition to be processed by the present data automatic processing is a new partition (that is, if the processing has not been performed before), the time when the present data automatic processing is completed and the position of the last line of data in the second database are directly stored in the comment of the partition.
Further, according to the foregoing method embodiment, another embodiment of the present invention further provides an apparatus for incremental data processing, as shown in fig. 2, where the apparatus includes: the device comprises an acquisition unit, a judgment unit and a processing unit. Wherein,
the acquiring unit 21 is configured to acquire, before performing automatic processing on the data of this time, a processing identifier of last automatic processing of the data recorded in the first database, where the automatic processing of the data is to periodically update the data in the second database into the first database, the automatic processing of the data is incremental processing or full processing, the processing identifier includes a time identifier and a position identifier, the time identifier is used to represent time when the automatic processing of the data is completed, and the position identifier is used to represent a position of last line of data in the second database after the automatic processing of the data is completed;
the obtaining unit 21 is further configured to obtain a time field from the metadata of the first database, where the time field is used to record a time when data processing on the first database is completed last time;
a judging unit 22, configured to judge whether the content in the time identifier is the same as the content in the time field;
and the processing unit 23 is configured to, when the determination results of the determining unit 22 are the same, perform incremental processing according to the position identifier, where the incremental processing is to import incremental data in the second database into the first database, and when the determination results of the determining unit 22 are different, perform full processing on the partition to be automatically processed on the data, where the full processing is to delete all data in the area to be processed in the first database, and import all data in the corresponding area in the second database into the area.
The device for incremental processing of data provided by the embodiment of the invention can acquire the processing identifier of the latest automatic processing of data recorded in the first database and the time field in the metadata of the current first database before the automatic processing of the data, and judge whether the content of the time identifier in the processing identifier is the same as the content of the time field; if the processing marks are the same, incremental processing is carried out according to the position marks in the processing marks, and if the processing marks are not the same, full processing is carried out on the partitions for automatically processing the data. Compared with the prior art that the next incremental processing is directly changed into the full processing of the whole SSAS, the method and the device firstly judge whether the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database before the automatic data processing is carried out, carry out the incremental processing on the current partition in the first database when the time of the last automatic data processing is the same as the time of the latest automatic data processing recorded in the metadata of the first database, and only carry out the full processing on the current partition without carrying out the full processing on the historical partition when the time of the last automatic data processing is different from the time of the latest automatic data processing recorded in the metadata of the first database, thereby improving the efficiency of the automatic data processing under the condition of ensuring the data.
Further, as shown in fig. 3, the apparatus further includes:
a detecting unit 24, configured to detect whether the processing identifier is a processing identifier corresponding to a partition to be automatically processed by the data before the obtaining unit 21 obtains the time field from the metadata of the first database.
The obtaining unit 21 is further configured to obtain the time field from the metadata of the first database when the detection result of the detecting unit 24 is that the process identifier is the process identifier corresponding to the partition.
Further, the processing unit 23 is further configured to perform full processing on the partition when the detection result of the detecting unit 24 is that the processing identifier is not the processing identifier corresponding to the partition.
Further, a detection unit 24 is configured to detect whether the processing identifier is located in the annotation of the partition.
Further, the obtaining unit 21 is further configured to obtain, after the processing unit 23 performs the automatic processing on the data of this time on the partition, a time when the automatic processing of the data of this time is completed, and a position of the last line of data in the second database.
Further, as shown in fig. 3, the apparatus further includes:
and a changing unit 25, configured to change the content in the time identifier to the time when the automatic processing of the current data acquired by the acquiring unit 21 is completed, and change the content in the position identifier to the position of the last row of data in the second database when the automatic processing of the current data is completed.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the method and apparatus for incremental processing of data according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (12)

1. A method of incremental processing of data, the method comprising:
before the data is automatically processed, acquiring a processing identifier of a partition processed by the latest data automatic processing recorded in a first database, wherein the data automatic processing is to periodically update data in a second database to the first database, the data automatic processing is incremental processing or full processing, the processing identifier comprises a time identifier and a position identifier, the time identifier is used for representing the time when the data automatic processing is completed, and the position identifier is used for representing the position of the last row of data in the second database after the data automatic processing is completed;
acquiring a time field from metadata of the first database, wherein the time field is used for recording the time when the data processing on the first database is completed last time;
judging whether the content in the time identifier is the same as the content in the time field;
if the judgment results are the same, performing incremental processing on the partition to be subjected to the automatic data processing according to the position identifier, wherein the incremental processing is to import incremental data in the second database into the first database;
if the judgment results are different, performing full processing on the partition for automatically processing the data, wherein the full processing is deleting all data in the area to be processed in the first database, and importing all data corresponding to the area in the second database into the area.
2. The method of claim 1, wherein prior to said obtaining a time field from metadata of said first database, said method further comprises:
detecting whether the processing identification is a processing identification corresponding to a partition to be automatically processed;
the obtaining a time field from metadata of the first database includes:
and if the processing identifier is the processing identifier corresponding to the partition, acquiring a time field from the metadata of the first database.
3. The method of claim 2, wherein if the processing identifier is not a processing identifier corresponding to the partition, the method further comprises:
and carrying out full processing on the partitions.
4. The method according to claim 2, wherein the detecting whether the processing identifier is a processing identifier corresponding to a partition to be automatically processed by data comprises:
detecting whether the processing identifier is located in an annotation of the partition.
5. The method of claim 1, wherein after the automatic processing of the current data for the partition, the method further comprises:
acquiring the time when the data automatic processing is finished, and changing the content in the time identification into the time when the data automatic processing is finished;
and acquiring the position of the last line of data in the second database when the current data is automatically processed, and changing the content in the position identifier into the position of the last line of data in the second database when the current data is automatically processed.
6. An apparatus for incremental processing of data, the apparatus comprising:
the data automatic processing system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a processing identifier of a partition processed by the latest data automatic processing recorded in a first database before the data automatic processing is carried out, the data automatic processing is to periodically update data in a second database into the first database, the data automatic processing is incremental processing or full processing, the processing identifier comprises a time identifier and a position identifier, the time identifier is used for representing the time when the data automatic processing is finished, and the position identifier is used for representing the position of the last row of data in the second database after the data automatic processing is finished;
the obtaining unit is further configured to obtain a time field from the metadata of the first database, where the time field is used to record a time when data processing on the first database is completed last time;
the judging unit is used for judging whether the content in the time identifier is the same as the content in the time field or not;
and the processing unit is used for performing incremental processing on the partitions to be subjected to automatic data processing according to the position identifiers when the judgment results of the judgment units are the same, wherein the incremental processing is to lead incremental data in the second database into the first database, and when the judgment results of the judgment units are different, the partitions to be subjected to automatic data processing are subjected to full processing, the full processing is to delete all data in the areas to be processed in the first database, and all data in the areas corresponding to the areas in the second database are led into the areas.
7. The apparatus of claim 6, further comprising:
the detection unit is used for detecting whether the processing identifier is a processing identifier corresponding to a partition to be automatically processed by the data before the acquisition unit acquires the time field from the metadata of the first database;
the obtaining unit is further configured to obtain a time field from the metadata of the first database when the detection result of the detecting unit is that the processing identifier is a processing identifier corresponding to the partition.
8. The apparatus according to claim 7, wherein the processing unit is further configured to perform full processing on the partition when the detection result of the detecting unit is that the processing identifier is not a processing identifier corresponding to the partition.
9. The apparatus according to claim 7, wherein the detecting unit is configured to detect whether the processing identifier is located in an annotation of the partition.
10. The apparatus according to claim 6, wherein the obtaining unit is further configured to obtain, after the processing unit performs the automatic processing on the partition for the current time, a time when the automatic processing for the current time is completed, and a position of a last row of data in the second database;
the apparatus further comprises:
and the changing unit is used for changing the content in the time identifier into the time when the automatic processing of the data is finished, and changing the content in the position identifier into the position of the last row of data in the second database when the automatic processing of the data is finished.
11. A storage medium, characterized in that the storage medium includes a stored program, and when the program runs, the storage medium is controlled by a device to execute the method for incremental processing of data according to any one of claims 1 to 5.
12. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to perform the method of data incremental processing according to any one of claims 1 to 5 when the program is run.
CN201510654209.4A 2015-10-10 2015-10-10 Data increment processing method and device Active CN106570024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510654209.4A CN106570024B (en) 2015-10-10 2015-10-10 Data increment processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510654209.4A CN106570024B (en) 2015-10-10 2015-10-10 Data increment processing method and device

Publications (2)

Publication Number Publication Date
CN106570024A CN106570024A (en) 2017-04-19
CN106570024B true CN106570024B (en) 2020-03-06

Family

ID=58507473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510654209.4A Active CN106570024B (en) 2015-10-10 2015-10-10 Data increment processing method and device

Country Status (1)

Country Link
CN (1) CN106570024B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815213A (en) * 2018-12-20 2019-05-28 武汉璞华大数据技术有限公司 It is deleted on a kind of Append-Only database and the method and system of modification data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521225A (en) * 2011-09-29 2012-06-27 用友软件股份有限公司 Incremental data extraction device and incremental data extraction method
CN102609337A (en) * 2012-01-19 2012-07-25 北京神州数码思特奇信息技术股份有限公司 Rapid data recovery method for memory database
CN102750283A (en) * 2011-04-20 2012-10-24 阿里巴巴集团控股有限公司 Massive data synchronization system and method
CN103297529A (en) * 2013-06-06 2013-09-11 浙江大学 Timestamp-based tree structure data synchronization method
CN104199945A (en) * 2014-09-10 2014-12-10 北京国双科技有限公司 Data storing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2514947B (en) * 2012-05-04 2015-06-17 Box Inc Repository redundancy implementation of a system which incrementally updates clients with events that occured via a cloud-enabled platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750283A (en) * 2011-04-20 2012-10-24 阿里巴巴集团控股有限公司 Massive data synchronization system and method
CN102521225A (en) * 2011-09-29 2012-06-27 用友软件股份有限公司 Incremental data extraction device and incremental data extraction method
CN102609337A (en) * 2012-01-19 2012-07-25 北京神州数码思特奇信息技术股份有限公司 Rapid data recovery method for memory database
CN103297529A (en) * 2013-06-06 2013-09-11 浙江大学 Timestamp-based tree structure data synchronization method
CN104199945A (en) * 2014-09-10 2014-12-10 北京国双科技有限公司 Data storing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于MapReduce的大数据增量处理研究;王强;《cnki硕博论文》;20140601;全文 *

Also Published As

Publication number Publication date
CN106570024A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
EP3764279A1 (en) Image recognition to support shelf auditing for consumer research
CN108255838B (en) Method and system for establishing intermediate data warehouse for big data analysis
CN104615777A (en) Method and device for real-time data processing based on stream-oriented calculation engine
CN104216822B (en) A kind of processing method and processing device of abnormal information
CN105426310A (en) Method and apparatus for detecting performance of target process
CN105447035B (en) data scanning method and device
CN111400288A (en) Data quality inspection method and system
CN111324781A (en) Data analysis method, device and equipment
CN113268641B (en) User data processing method based on big data and big data server
CN108154289A (en) A kind of product quality factor information automatic analysis system and automatic analysis method
CN106648839B (en) Data processing method and device
CN105447064B (en) Electronic map data making and using method and device
CN106570024B (en) Data increment processing method and device
CN112783749A (en) Static code scanning optimization method and device, electronic equipment and storage medium
CN107590233B (en) File management method and device
CN106919762B (en) Finite element grid array modeling method
CN106776348B (en) Test case management method and device
CN116911959B (en) Data processing method for building material non-standard part
CN104268277A (en) Data reading method and device for database
CN106446687B (en) Malicious sample detection method and device
CN111130921B (en) Method and device for processing performance index of core network element
CN109033210A (en) A kind of method and apparatus for excavating map point of interest POI
CN110990395B (en) Data processing method and device
CN112783751A (en) Incremental code scanning method and device, electronic equipment and storage medium
CN113128804A (en) Data management method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant